Closed jlegind closed 1 month ago
It would be good if we could see this in Specify
After discussing this with @bhsi-snm : Not sure how easy it is to do in OpenRefine. Perhaps @jlegind can conjure up a little utility program for adding this Source file column? Then I'll repurpose a field in Specify to map it to. Or perhaps investigate OpenRefine option?
There isn't really a way for GREL to get the file, or rather, OpenRefine project name as far as I can see. The only way I can think of is this being added manually. I would also recommend treating this as a tabular remark field (c.f. #444) so we don't occupy any customizable text fields with it.
We already have a remarks, the new column might be 'remark_source' which can be: NHMD_PinnedInsects_20240119_15_40_RL_original.csv
Question: Should remark_date be the date that the export was made, or the date it was post processed?
As a result from the implementation of #444 we already have a column "remark source", so I suggest you choose another name. As you can see, for tabular remarks, we need three columns:
For the specimen level remarks field, these fields are just prefixed "remark", so you get "remark source" and "remark date".
Actually using the term "source" for the filename of the data is confusing here; Maybe it's better to use "datafile".
So that means the following column names;
@bhsi-snm Do you approve of this proposal?
name of the data is confusing here; Maybe it's better to use "datafile".
So that means the following column names;
Since we have code ready for monitoring a directory: I could extend this to add "datafile_source" and "datafile_date" to the csv export. This circumvents openRefine.
The "datafile_source" and "datafile_date" and "datafile_remarks" columns for the tabular remarks have been added through the monitoring script.
See issue #492 on conditionally adding values in the remarks columns.
The monitoring script was not entirely implemented before Jan left so it has been made part of the post-processing GREL script instead (ticket #506 ).
What is the issue ?
It would help in debugging and 'housekeeping' if the imported Digi app records had their original source file name attached in a separate field:
Source "NHMD_PinnedInsects_20231121_16_16_SS_original.csv"
Detailed description of the issue.
If there is a discrepancy between imported records in Specify and what is in the 4.Archive directory, then having the source path would be a massive help.
Why is it needed/relevant ?
We gain a certain amount of future proofing in that it addresses issues like the one above and anticipates unforeseen problems.
Give scenario(s) of why and when this could be relevant.
If a curator discovers something in specify that is a little off the mark, we can go all the way back to the source to investigate. We have already agreed that the postprocessing GREL scripts should have their own version as they evolves with business needs. Adding a source field ties neatly into this as it makes forensics much easier.
Estimate level of effort required.
easy
What could be the challenges ?
There does not seem to be a way to automatically add the file name to a column in open refine. That means it has be added manually in the open refine interface which is a trivial task.
What documentation required?
The documentation file "import_protocol_postProcessing.md" will need to be updated.