Closed brittnylapierre closed 2 years ago
Having both #53 and this PR being worked on at the same time likely making things messy. How about just do both in this PR : improving DEV environment as well as testing the MARC update.
Here is what the old "./run" script would allow, which is to run a shell within the container so you can test whatever commands you want.
russell@russell-XPS-13-7390:~/git/cihm-metadatabus$ docker-compose run cihm-metadatabus bash
Creating cihm-metadatabus_cihm-metadatabus_run ... done
tdr@29d3c127dbec:~$ ls -l /var/log/tdr/
total 0
tdr@29d3c127dbec:~$ hammer2
tdr@29d3c127dbec:~$ ls -l /var/log/tdr/
total 4
-rw-r--r-- 1 tdr tdr 113 Sep 12 18:20 root.log
tdr@29d3c127dbec:~$ cat /var/log/tdr/root.log
2022/09/12 18:20:26 - INFO {CIHM.TDR} [CIHM::Meta::Hammer2::hammer] Hammer2 skip=0 limit=9 maxprocs=4 timelimit=
tdr@29d3c127dbec:~$
Note the change to the README -- I suggest to have the logfile outside of the container, so you can look at it (tail it, whatever) from outside the container as well.
russell@russell-XPS-13-7390:~/git/cihm-metadatabus$ ls -la log
total 12
drwxrwxr-x 2 1117 1117 4096 Sep 12 18:20 .
drwxrwxr-x 9 russell russell 4096 Sep 12 18:13 ..
-rw-r--r-- 1 1117 1117 113 Sep 12 18:20 root.log
russell@russell-XPS-13-7390:~/git/cihm-metadatabus$ cat log/root.log
2022/09/12 18:20:26 - INFO {CIHM.TDR} [CIHM::Meta::Hammer2::hammer] Hammer2 skip=0 limit=9 maxprocs=4 timelimit=
russell@russell-XPS-13-7390:~/git/cihm-metadatabus$
Next steps would be to point to the CouchDB within the environment in Access-Platform, and document accordingly (in comments in env-dist
or wherever you feel appropriate. The current URL's point to jarlsberg, which is for production data.
A note in the change of the logic.
In the previous version, there could be multiple 260 fields. Only the first one was used to try to extract a pubmin and pubmax. Subsequent ones would still be part of the text "pu" field, displayed by CAP.
With the new logic only the first 260 field will be used, and only if there isn't a 264 field (IE: it is one or the other, not supporting both and not supporting repetition of the field).
Is this the intended logic, and has that been checked with Natalie?
Note: We can change the logic later if it turns out to not be what is needed. We should be as transparent as we can about how these fields are being interpreted by the custom "flatten" function.
Alternate logic which accepts repeating fields for the text display, but only uses the first one for the pubmin/pubmax.
Both of these suggest the fields are repeatable.
https://www.loc.gov/marc/bibliographic/bd260.html https://www.loc.gov/marc/bibliographic/bd264.html
In the future we plan to support multiple date ranges rather than a single pubmin/pubmax, at which point we can change the logic to support as many dates as provided in the MARC record.
my @publisharray;
push @publisharray, $record->field('264');
push @publisharray, $record->field('260');
foreach my $publishfield ( @publisharray ) {
addArray( \%flat, 'pu', $publishfield->as_string() );
if (defined $publishfield->subfield("c") && ! defined $flat{'pubmin'}) {
$flat{'pubmin'} = iso8601( $publishfield->subfield("c"), 0 );
$flat{'pubmax'} = iso8601( $publishfield->subfield("c") , 1 );
}
}
Thoughts? This would of course need to be tested, as this is only a suggestion about logic, using perl-ish stuff as pseudo-code.
@RussellMcOrmond I'll change it then I'll test again then build an image so we can have Natalie test the change too
Do you have any tools like https://xml-copy-editor.sourceforge.io/ or https://marcedit.reeset.net/ installed?
MarcEdit is written to the .NET APIs, so can run on Linux if you want.
russell@russell-XPS-13-7390:~$ which MarcEdit
/home/russell/bin/MarcEdit
russell@russell-XPS-13-7390:~$ cat /home/russell/bin/MarcEdit
#!/bin/bash
mono ~russell/bin/marcedit/MarcEdit.exe $@
https://marcedit.reeset.net/marcedit-linux-installation-instructions
This way you can generate some of your own test cases. We don't have the "demo" or "test" type environments set up for the Metadatabus (IE: don't run more than one, against multiple databases/etc), so our "dev" testing is the primary testing before we make the updates available for everyone.
Took your logic and made it one or the other, not both (264 preffered, if not there then look for 260, as per Natalie's request)
I tested on my computer - I do have MarcEdit but wasn't able to format a test file without Natalie's help. But now that I have some I should be able to do it in the future
You may want to check with Natalie if that is the logic for the date, or logic for the text that shows up in the "about" field as well.
She may only be thinking of pubmin/pubmax.
The order I put it suggested to use the first date found, with 264 put into the array before 260 which would offer the same logic.
Or, I can ask @nataliemacdonald in this PR directly.
Natalie,
In following the previous logic, we took the date from the first 260, but would copy into the "pu" field seen in the About on CAP any number of 260 fields as it is a repeatable field.
The logic I was proposing takes any 264 fields and 260 fields, and builds a single array (264 first).
It then loops through them adding all strings to "pu". It sets the pubmin/pubmax fields based on the first date it finds, which will be 264 and only 260 if it didn't already find a date.
Did you want the "pu" field to only include one or the other, if both existed?
(Note: moving away from the single pubmin/pubmax to an array of date ranges is a later project that will include #50)
@RussellMcOrmond restored the logic - feel free to merge if happy today!
My first update to the metadata bus! Not tested yet - as I need to get it running on my computer.