EDIorg / ECC

ECC = EML Congruence Checker
5 stars 0 forks source link

Possible new check: use of function="information" #24

Open mobb opened 5 years ago

mobb commented 5 years ago

Case A If an eml doc does contain a distribution tree at the entity level, the first url must contain function="download", or use the default of no function attribute, which defaults to "download". Whether or not the data file for that entity is uploaded thru the url in the eml doc or manually using the form, that eml doc should fail the checker. Reason: EML BP says the url for an entity should be data, not information.

Case B If an eml doc does not contain a distribution tree at the entity level, then there is no url element, and no function attribute. In this case the data file for that entity would have to be uploaded manually using the form. And pasta would be filling in the whole distribution tree, not just changing the url. I do not know if pasta handles this case. I have not run a test doc this way.

To save others from a dead end in this maze, Margaret and I did explore the case of pasta correcting the function attribute in the url element. However, that would be a change of pattern. Presently pasta does not correct other user errors, even when it could be assumed what the error and fix are. I suggest we not change that pattern.

mobb commented 5 years ago

The original email thread, with examples:

On Mon, Jun 3, 2019 at 9:43 AM Duane Costa dcosta.lternet@gmail.com wrote: Hi Gastil,

Thanks for providing an excellent description of the issue and a test data package on portal-d. I downloaded the EML and tried it out with function="information" and with/without manual upload so I could reproduce the behavior. It seems that better error reporting is needed here, either through a quality check or a more informative error message from PASTA. The current message is:

   An entity failed to download successfully: packageId: knb-lter-mcr.999.1; entity name: This_is_a_test_entity_name; entity id: bcba6aa9830c4f64090882dd2e8acbee

Also, I'm not entirely sure what the behavior should be when function="information" in the metadata is combined with a manual upload. Those two things seem somewhat incompatible so perhaps this too should be flagged as an error. (??)

I'll enter all this info into the bug tracker.

Duane

On Sat, Jun 1, 2019 at 3:44 PM Gastil Gastil-Buhl gastil.gastil-buhl@ucsb.edu wrote: Hi Duane,

No action item. Just fyi.

I just now diagnosed an error on my end, with my EML file, using evaluate on portal. I share it now, in case someone else has this error. It is trivial to fix, but for me was not trivial to solve.

The error was caused by having function="information" instead of function="download" as an attribute of the element to download the data file.

Now the strange part. This not only prevented the entity from downloading via the url in the eml, it actually caused the same error when I opted to manually select the file, using the form option. How strange is that?

I created a MRE test case, knb-lter-mcr.999 in portal-d. I uploaded revision 1 with function="download". Then I made a revision 2 with function="information". Revision 2 fails both in the usual load-from-url mode and the manual load option from portal's form.

In the test file, I used a direct url rather than one thru my optional-self-id system, to simplify the example. And I normally never use the manual upload. I was only using that for diagnosis. But that was the most strange part.

Now that my EML is fixed, having replaced function="information" with "download", my package evaluates fine. And this would not even have happened except I was uploading some hand-written EML from 2012, before we had the program writeEML.

Just stash this info in case someone else has problems that might be this.

Gastil