One of the main difficulties me (and at least one other early adopter) have run into is that the "xmlpath" variable indicated in the "config.yaml" was not as required by the script and no XML-TEI files were found. As a consequence, no "xxx_metadata.csv" file is created, causing the next function to compain that it can't be found.
The underlying reason is most likely that the XML-TEI files were not found. It would be good to add a check to "get_metadata" that basically warns the user if no XML-TEI files were found and tells them to adjust their "xmlpath" variable.
Given how the script now works (with the correct full path being generated based on the language and level indicated in the config file), maybe the variable "xmlpath" should be changed and simply be called "basedir" or "workingdir" or something like that, because this is where everything starts from. (Careful not to interfere with the existing "workdir" parameter; but is it doing anything?)
I think we need to make some assumptions of the script more explicit in the "HOWTO.md" file. Where are the repository folders? Where is the "worldcat" folder? Where do the results get written to? etc.
The xmlpath variable is tested in get_settings.py now. If the path doesn't exist, a warning is raised. I thought the test would fit better to the step where the variable is read out from the config file.
One of the main difficulties me (and at least one other early adopter) have run into is that the "xmlpath" variable indicated in the "config.yaml" was not as required by the script and no XML-TEI files were found. As a consequence, no "xxx_metadata.csv" file is created, causing the next function to compain that it can't be found.
The underlying reason is most likely that the XML-TEI files were not found. It would be good to add a check to "get_metadata" that basically warns the user if no XML-TEI files were found and tells them to adjust their "xmlpath" variable.
Given how the script now works (with the correct full path being generated based on the language and level indicated in the config file), maybe the variable "xmlpath" should be changed and simply be called "basedir" or "workingdir" or something like that, because this is where everything starts from. (Careful not to interfere with the existing "workdir" parameter; but is it doing anything?)
I think we need to make some assumptions of the script more explicit in the "HOWTO.md" file. Where are the repository folders? Where is the "worldcat" folder? Where do the results get written to? etc.