Closed Shanrahan16 closed 9 months ago
I am mostly happy with this, a few points:
- how would images written to an alternative image standard fair? In the reader you only keep tags that are expected in the image standard
- what happens if one of the tags expected is not in the image file? (e.g. someone forgot to include the XMP-photoshop:Headline tag)
- the validate_orcid_ID rule could return multiple identical errors, or if the string being checked is <32 characters long the function errors out
- similarly, if someone forgets to include the space in the Relation tag the relation_url_checker function will error out
not sure this is for right now, but the latitude and longitude rule_funcs could be generalised into one function, where the maximum allowable value (90 or 180 in this case) is passed through to this function (similar to the string_of_length rule_func). This more generalised approach could mean this function is more widely useful in the future, rather than writing a new rule_func for every max value to check against
- you could go one further and change it into a value_range check, and check the value against a minimum and maximum allowable range (e.g. -90 to 90). This could be useful in the future if you wanted to check a non-symmetrical(?) range
- again, not sure this is for right now, but the _map_type_rule function in rules.py could be re-written so it checks early on if the rule should return an error or warning. For example, the conditionally executed code for rule-func (from line 68) is identical to that for rule-func-warning (from line 74), except for which list the output from the rule is written to. We could reduce duplication of code, it would be easier to add in type-rule-warning in the future if needed, and if any new types of rules come along it there would be less code needed to implement both an error and a warning version
01. [global-attributes:**************:XMP-photoshop:Instructions]: Attribute 'XMP-photoshop:Instructions' does not exist.
if
statements and reordered so the function won't error out if <32 charactersI'll look into the rest
Relation
URL
", so it not being in that format should be reported by checksit.
- For NCAS-IMAGE specific checks, yes, but we don't want to limit checksit so that it couldn't do checks for other standards (whether they currently exist or not). 2 & 3. Looks good.
Could this be a checksit error, rather than a Python error that causes checksit to exit? If I remember correctly (I don't have a copy of the standard in front of me), the value for the tag must be of the form "
Relation
URL
", so it not being in that format should be reported by checksit.
- You already have a separate URL checker function, you could have a relation_check rule that firstly checks the format of the tag, and then if that passes calls the URL checker function with the URL from the Relation tag
It's more for future use, so that one day checksit could be used for images that want to match a different standard in a scenario that has nothing to do with NCAS or CEDA. It could be similar if future versions of NCAS-Image have different tags - the readers are not standard specific, they are chosen by the file extention type. At this time, I doubt if anyone is planning on using checksit for non-NCAS-IMAGE images, so I wouldn't say this must be addressed now, but it's not something that I would want to ignore forever.
Ah - so @joshua-hampton, you're on about how the reader is constructed in this case, not the actual checks... that ideally we want to look to handle any tags that are included in the metadata... and the checker for a given standard happens to examine a given set of these for conformancy.... and that given set is defined in the spec file.
Yes, sorry I wasn't clear on that!
One final bit from me (I think), in the url_checker
function, two separate requests are made to see if the value is reachable, one within the try clause, and another within the else clause (and using two different modules to do so). Data from the try clause gets passed onto the else clause (if the else clause is executed), so I'm not sure a second request is necessary?
One final bit from me (I think), in the
url_checker
function, two separate requests are made to see if the value is reachable, one within the try clause, and another within the else clause (and using two different modules to do so). Data from the try clause gets passed onto the else clause (if the else clause is executed), so I'm not sure a second request is necessary?
Hi Josh, I'm just looking at this now. The reason it ended up like that was because requests.get()
didn't work with the try
bit. I'll see if I can remove requests.get()
and just use urlopen()
One final bit from me (I think), in the
url_checker
function, two separate requests are made to see if the value is reachable, one within the try clause, and another within the else clause (and using two different modules to do so). Data from the try clause gets passed onto the else clause (if the else clause is executed), so I'm not sure a second request is necessary?Hi Josh, I'm just looking at this now. The reason it ended up like that was because
requests.get()
didn't work with thetry
bit. I'll see if I can removerequests.get()
and just useurlopen()
They appear to be doing different things:
test 1 <http.client.HTTPResponse object at 0x7f23c1778dc0>
test 2 <Response [200]>
test 1 <http.client.HTTPResponse object at 0x7f23c1778e50>
test 2 <Response [200]>
test 1 <http.client.HTTPResponse object at 0x7f23c1779210>
test 2 <Response [200]>
One final bit from me (I think), in the
url_checker
function, two separate requests are made to see if the value is reachable, one within the try clause, and another within the else clause (and using two different modules to do so). Data from the try clause gets passed onto the else clause (if the else clause is executed), so I'm not sure a second request is necessary?Hi Josh, I'm just looking at this now. The reason it ended up like that was because
requests.get()
didn't work with thetry
bit. I'll see if I can removerequests.get()
and just useurlopen()
They appear to be doing different things:
test 1 <http.client.HTTPResponse object at 0x7f23c1778dc0> test 2 <Response [200]> test 1 <http.client.HTTPResponse object at 0x7f23c1778e50> test 2 <Response [200]> test 1 <http.client.HTTPResponse object at 0x7f23c1779210> test 2 <Response [200]>
This should be resolved now 👍
After all that testing, I think the reason for failures is that exiftool doesn't exist on the GitHub workflow image
Adding checks for the NCAS Image Standard v1.0