cedadev / checksit

File-checking made simple
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Ncas image v1.0 #36

Closed Shanrahan16 closed 9 months ago

Shanrahan16 commented 1 year ago

Adding checks for the NCAS Image Standard v1.0

Shanrahan16 commented 1 year ago

I am mostly happy with this, a few points:

  • how would images written to an alternative image standard fair? In the reader you only keep tags that are expected in the image standard
  • what happens if one of the tags expected is not in the image file? (e.g. someone forgot to include the XMP-photoshop:Headline tag)
  • the validate_orcid_ID rule could return multiple identical errors, or if the string being checked is <32 characters long the function errors out
  • similarly, if someone forgets to include the space in the Relation tag the relation_url_checker function will error out
  • not sure this is for right now, but the latitude and longitude rule_funcs could be generalised into one function, where the maximum allowable value (90 or 180 in this case) is passed through to this function (similar to the string_of_length rule_func). This more generalised approach could mean this function is more widely useful in the future, rather than writing a new rule_func for every max value to check against

    • you could go one further and change it into a value_range check, and check the value against a minimum and maximum allowable range (e.g. -90 to 90). This could be useful in the future if you wanted to check a non-symmetrical(?) range
  • again, not sure this is for right now, but the _map_type_rule function in rules.py could be re-written so it checks early on if the rule should return an error or warning. For example, the conditionally executed code for rule-func (from line 68) is identical to that for rule-func-warning (from line 74), except for which list the output from the rule is written to. We could reduce duplication of code, it would be easier to add in type-rule-warning in the future if needed, and if any new types of rules come along it there would be less code needed to implement both an error and a warning version
  1. I think we're only interested in the tags outlined in the NCAS image standard?
  2. It fails, e.g 01. [global-attributes:**************:XMP-photoshop:Instructions]: Attribute 'XMP-photoshop:Instructions' does not exist.
  3. I have combined the if statements and reordered so the function won't error out if <32 characters
  4. It will now raise an exception if there's no space in relation.

I'll look into the rest

joshua-hampton commented 1 year ago
  1. For NCAS-IMAGE specific checks, yes, but we don't want to limit checksit so that it couldn't do checks for other standards (whether they currently exist or not). 2 & 3. Looks good.
  2. Could this be a checksit error, rather than a Python error that causes checksit to exit? If I remember correctly (I don't have a copy of the standard in front of me), the value for the tag must be of the form "Relation URL", so it not being in that format should be reported by checksit.
    • You already have a separate URL checker function, you could have a relation_check rule that firstly checks the format of the tag, and then if that passes calls the URL checker function with the URL from the Relation tag
Shanrahan16 commented 1 year ago
  1. For NCAS-IMAGE specific checks, yes, but we don't want to limit checksit so that it couldn't do checks for other standards (whether they currently exist or not). 2 & 3. Looks good.
  2. Could this be a checksit error, rather than a Python error that causes checksit to exit? If I remember correctly (I don't have a copy of the standard in front of me), the value for the tag must be of the form "Relation URL", so it not being in that format should be reported by checksit.

    • You already have a separate URL checker function, you could have a relation_check rule that firstly checks the format of the tag, and then if that passes calls the URL checker function with the URL from the Relation tag
  1. I've chatted to Graham, and as we're specifying that the standard we're checking against is NCAS image v1.0, it's fine. Unless there's something we've missed?
joshua-hampton commented 1 year ago

It's more for future use, so that one day checksit could be used for images that want to match a different standard in a scenario that has nothing to do with NCAS or CEDA. It could be similar if future versions of NCAS-Image have different tags - the readers are not standard specific, they are chosen by the file extention type. At this time, I doubt if anyone is planning on using checksit for non-NCAS-IMAGE images, so I wouldn't say this must be addressed now, but it's not something that I would want to ignore forever.

gap736uk commented 1 year ago

Ah - so @joshua-hampton, you're on about how the reader is constructed in this case, not the actual checks... that ideally we want to look to handle any tags that are included in the metadata... and the checker for a given standard happens to examine a given set of these for conformancy.... and that given set is defined in the spec file.

joshua-hampton commented 1 year ago

Yes, sorry I wasn't clear on that!

joshua-hampton commented 1 year ago

One final bit from me (I think), in the url_checker function, two separate requests are made to see if the value is reachable, one within the try clause, and another within the else clause (and using two different modules to do so). Data from the try clause gets passed onto the else clause (if the else clause is executed), so I'm not sure a second request is necessary?

Shanrahan16 commented 1 year ago

One final bit from me (I think), in the url_checker function, two separate requests are made to see if the value is reachable, one within the try clause, and another within the else clause (and using two different modules to do so). Data from the try clause gets passed onto the else clause (if the else clause is executed), so I'm not sure a second request is necessary?

Hi Josh, I'm just looking at this now. The reason it ended up like that was because requests.get() didn't work with the try bit. I'll see if I can remove requests.get() and just use urlopen()

Shanrahan16 commented 1 year ago

One final bit from me (I think), in the url_checker function, two separate requests are made to see if the value is reachable, one within the try clause, and another within the else clause (and using two different modules to do so). Data from the try clause gets passed onto the else clause (if the else clause is executed), so I'm not sure a second request is necessary?

Hi Josh, I'm just looking at this now. The reason it ended up like that was because requests.get() didn't work with the try bit. I'll see if I can remove requests.get() and just use urlopen()

They appear to be doing different things:

test 1 <http.client.HTTPResponse object at 0x7f23c1778dc0>
test 2 <Response [200]>
test 1 <http.client.HTTPResponse object at 0x7f23c1778e50>
test 2 <Response [200]>
test 1 <http.client.HTTPResponse object at 0x7f23c1779210>
test 2 <Response [200]>
Shanrahan16 commented 1 year ago

One final bit from me (I think), in the url_checker function, two separate requests are made to see if the value is reachable, one within the try clause, and another within the else clause (and using two different modules to do so). Data from the try clause gets passed onto the else clause (if the else clause is executed), so I'm not sure a second request is necessary?

Hi Josh, I'm just looking at this now. The reason it ended up like that was because requests.get() didn't work with the try bit. I'll see if I can remove requests.get() and just use urlopen()

They appear to be doing different things:

test 1 <http.client.HTTPResponse object at 0x7f23c1778dc0>
test 2 <Response [200]>
test 1 <http.client.HTTPResponse object at 0x7f23c1778e50>
test 2 <Response [200]>
test 1 <http.client.HTTPResponse object at 0x7f23c1779210>
test 2 <Response [200]>

This should be resolved now 👍

joshua-hampton commented 10 months ago

After all that testing, I think the reason for failures is that exiftool doesn't exist on the GitHub workflow image