Open chreman opened 9 years ago
I will review with the team tomorrow what the best strategies are for creating new ctree/cmdirs . It may require a specific command.
I'm not hugely against empty folders tbh. It's a visible reminder of a paper or patent that matches the getpapers search, something more easily/quickly seen than making a human readable version of the JSON file (btw jq which can or is typically used to do this is not installed in the VM).
Are the empty folders inconsistent? What problems does it cause?
We have not fully described what a CTree directory SHOULD or MUST look like. The current approach is that we have a metadata.json file, but that hasn't been added yet. So I would argue that a CTree MUST have metadata.json file which acts (a) as a marker that this is a CTree and (b) a log of what has been done (c) what the contents currently are.
On Sun, Jul 12, 2015 at 12:43 PM, Ross Mounce notifications@github.com wrote:
I'm not hugely against empty folders tbh. It's a visible reminder of a paper or patent that matches the getpapers search, something more easily/quickly seen than making a human readable version of the JSON file (btw jq which can or is typically used to do this is not installed in the VM).
Are the empty folder inconsistent? What problems does it cause?
— Reply to this email directly or view it on GitHub https://github.com/ContentMine/getpapers/issues/45#issuecomment-120710817 .
Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069
It causes problems when using norma, which gives e.g.
245 [main] DEBUG org.xmlcml.cmine.args.DefaultArgProcessor - ... No reserved files or directories: dir: dinosaurs-eumpc/PMC3633922
when I remove all empty folders, norma runs. so either norma accepts empty folders (but then we have a problem with the definition of minimum ctree, because what should norma put into this folder? could also leave it continously empty), or getpapers creates no empty folders
Is the Norma message an inconvenience or a This means we need metadata.json
or similar as a priority for identifying a ctree
. So getpapers
should really create this file.
Ah yes, also because quickscrape creates a result.json in each ctree, and getpapers an apiname_results.json, which gets overwritten with each search.
in cases where neither pdf nor xml are found, folders are created anyway. this may be irritating when interpreting results and working with e.g. norma and other tools