Validator Logic (Do you mean section?)

ostanley commented 5 years ago

We want the validator to fail completely in some cases and partially in others.

Completely: incorrect # of fiducials Some tolerance failure if X partial failures, fail completely (~25%) misordering of the fiducials inside the files

Partially: Incorrect capitalization Incorrect spelling L-R fiducial

When throwing a complete failure still throw out as many errors as possible. Any thoughts @jclauneuro @tkkuehn

kaitj commented 5 years ago

Had an additional thought and it might just be a temporary workaround to the issues we are currently having. Could we write the validations to assume that the templates are always correct (ie. it doesn't actually check the templates) and only looks at the user submitted fiducials?

ostanley commented 5 years ago

Then how would it compare the names? Just from our list? Long term I think it should maybe get the list from the template itself but short term this seems like a good option.

On Thu, Mar 21, 2019 at 9:50 AM Jason Kai notifications@github.com wrote:

Had an additional thought and it might just be a temporary workaround to the issues we are currently having. Could we write the validations to assume that the templates are always correct (ie. it doesn't actually check the templates) and only looks at the user submitted fiducials?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Park-Patrick/fidvalidator/issues/27#issuecomment-475235919, or mute the thread https://github.com/notifications/unsubscribe-auth/ABkpiCqGWiOy2GokDsfv2an3F1ltt9yHks5vY44igaJpZM4b9JeI .

kaitj commented 5 years ago

The names are being compared to a hard-coded list (I believe). I think that was a result of the fiducial names being static so the comparison will still happen for the submitted fiducials. The assumption would be that the fiducials are in the correct order in the template and even if the naming is wrong, the fiducial coordinates are still correct.

tkkuehn commented 5 years ago

(Sorry for the wall of text here, I started writing my thoughts and they ballooned)

I think it's worth discussing what we precisely mean by failing completely vs. partially.

I agree that the most valuable behaviour would be to communicate as many failures as possible - the sticking point just seems to be at what point we stop trying to parse the file.

It seems to me that there are two major steps: first, we want to make sure that the fcsv is formatted properly, then we want to see if there are any problems with the fiducial placements.

Starting with the fcsv formatting, we can start by looking at the fcsv format and take note of any problems but then keep going.

I think a reasonable approach would be to read the first (up to) 32 rows in the fcsv, and check for each of the following issues in each row:

Does the row have the right number of fields?
- This makes interpreting the fields kind of a crapshoot because we don't know for sure which fields are missing, where fields were added, or if the fields we need are there at all
- We could try to look at the indices where we normally expect the fields we care about to be, but I think this is a big enough problem that we should just stop trying to parse the row here in this case
Is the label correct?
- Could either not be an integer, or the incorrect integer based on the index of the row we're looking at
- We can keep going in this case, but it definitely makes the fcsv invalid
- If the label isn't an integer, we can't check its correspondence with the row's desc
Is the desc known?
- Could be a known desc with incorrect capitalization, which we should report but doesn't stop us from any further validation
- Could also be a totally unrecognized desc, which means we can't check its correspondence with the row's label and the fcsv is invalid
Do the row's label and desc match?
- Can't check if 2 or 3 fails
- If not, the fcsv is invalid
Are the values of x, y, and z valid?
- Could be infinite or not a number that can be parsed, either one of which means the fcsv is invalid.

If after this process we find that there are not exactly 32 rows, we should report that as well and stop here. We can also report which fiducials we didn't find. If any of the above issues are present, we should stop here and report all the problems.

If no major formatting issues were found, we know the fcsv is at least formatted properly. Now we can move on to analyzing the fiducial locations, checking for the following issue:

Are any L-R fiducials likely reversed, based on their locations in the reference template?

I don't know of any other fiducial placement errors we can check for automatically, but if there were we could do that at this point, then finish and report any we find or conclude that the fcsv is valid and show the distance metrics.

Let me know if this makes sense and if there's anything to add or think about differently!

kaitj commented 5 years ago

I think it's worth discussing what we precisely mean by failing completely vs. partially.

I think what we decided on here was a complete failure was where we stop trying to parse the file while a partial failure we would still continue to parse and perform the computation.

I agree that the most valuable behaviour would be to communicate as many failures as possible - the sticking point just seems to be at what point we stop trying to parse the file.

For this point, we haven't come up with a good percentage yet, but I was thinking with 32 fiducials, probably somewhere around 1/4 of those failing (so about 8) would result in us stopping to parse the files. This could maybe even be less than 8. Should be a point of discussion.

It seems to me that there are two major steps: first, we want to make sure that the fcsv is formatted properly, then we want to see if there are any problems with the fiducial placements.

Starting with the fcsv formatting, we can start by looking at the fcsv format and take note of any problems but then keep going.

I think a reasonable approach would be to read the first (up to) 32 rows in the fcsv, and check for each of the following issues in each row:

I agree with this being a good first step.

Does the row have the right number of fields?

This makes interpreting the fields kind of a crapshoot because we don't know for sure which fields are missing, where fields were added, or if the fields we need are there at all

We could try to look at the indices where we normally expect the fields we care about to be, but I think this is a big enough problem that we should just stop trying to parse the row here in this case

I'm not sure if this is something we would need to check as the fcsv is generated through Slicer, which I think populates all of the required fields. This may be somethign we should check by trying to generate an empty fcsv file. The only way columns would be lost is this is done manually after the it was saved from Slicer.

Is the label correct?

Could either not be an integer, or the incorrect integer based on the index of the row we're looking at

We can keep going in this case, but it definitely makes the fcsv invalid

If the label isn't an integer, we can't check its correspondence with the row's desc

I think in this situation, it should probably be a "complete failure". We can't gurantee that the fiducial this label belongs to is correct and would affect the downstream fiducials.

Is the desc known?

Could be a known desc with incorrect capitalization, which we should report but doesn't stop us from any further validation Agreed with this point.

Could also be a totally unrecognized desc, which means we can't check its correspondence with the row's label and the fcsv is invalid In this situation, could we not continue with the computation with the assumption everything else is correct (but still throw a warning/error message to the user to double check?)

Do the row's label and desc match?

Can't check if 2 or 3 fails

If not, the fcsv is invalid

If 2 fails, it should just be a complete failure anyways I think (see point above). If 3 fails, it can be an error message/warning.

Are the values of x, y, and z valid?

Could be infinite or not a number that can be parsed, either one of which means the fcsv is invalid.

If after this process we find that there are not exactly 32 rows, we should report that as well and stop here. We can also report which fiducials we didn't find. If any of the above issues are present, we should stop here and report all the problems.

If no major formatting issues were found, we know the fcsv is at least formatted properly. Now we can move on to analyzing the fiducial locations, checking for the following issue:

Are any L-R fiducials likely reversed, based on their locations in the reference template?

I think Daniel previously wrote something to check for reversal of fiducials during the previous Brainhack coding sprint.

I don't know of any other fiducial placement errors we can check for automatically, but if there were we could do that at this point, then finish and report any we find or conclude that the fcsv is valid and show the distance metrics.

One point to think about (and I think this is open in another issue so something to discuss there) is how we want to display the distance metrics.

Anyways, just my thoughts early this morning. Hopefully they make sense.

tkkuehn commented 5 years ago

I'm not sure if [the number of fields] is something we would need to check as the fcsv is generated through Slicer, which I think populates all of the required fields. This may be somethign we should check by trying to generate an empty fcsv file. The only way columns would be lost is this is done manually after the it was saved from Slicer.

I think this is a way of handling text files that just aren't (f)csvs at all and got uploaded for whatever reason, which it would probably be good to be prepared for. On that note, we should probably make sure the system handles non-text files gracefully.

[Re: 4] If 2 fails, it should just be a complete failure anyways I think (see point above). If 3 fails, it can be an error message/warning.

It should be noted that it's possible for both 2 and 3 to succeed and this one to fail (although we could just roll it into 3 conceptually).

Everything else you said sounds reasonable to me!

kaitj commented 5 years ago

With capitalization not being important at this stage, should we leave out the "Incorrect capitalization" as a partial failure then?

ostanley commented 5 years ago

I think so. It doesn't make sense if we aren't enforcing it on our end.

On Mon, Apr 15, 2019 at 2:11 PM Jason Kai notifications@github.com wrote:

With capitalization not being important at this stage, should we leave out the "Incorrect capitalization" as a partial failure then?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Park-Patrick/fidvalidator/issues/27#issuecomment-483360254, or mute the thread https://github.com/notifications/unsubscribe-auth/ABkpiEP6reOhxJXkQ9423BuLARJ90Bahks5vhMDqgaJpZM4b9JeI .

kaitj commented 4 years ago

With fiducial subsets in mind (#74) along with other features implemented, is this still something we need to worry about? Most of the initial errors / warnings have either been implemented or is no longer relevant (ie. misordering of fiducials).

kaitj commented 3 years ago

Closing this issue. Most recent implementation seem to address or make some of the concerns irrelevant. Can revisit this later if there are certain things we want to address.

afids / afids-validator

Validator Logic (Do you mean section?) #27