NaturalHistoryMuseum / scratchpads2

Scratchpads 2.0
http://scratchpads.org
GNU General Public License v2.0
199 stars 83 forks source link

eMonocot: export Phylogeny and key files in the DwC-A #3558

Closed informatics-dev closed 10 years ago

informatics-dev commented 11 years ago

Description:

For storage of character matrix data in the scratchpads (as a precursor to a fully developed character project module) and phylogeny data and to enable the eMonocot Portal to harvest these Ben C suggests:

Users can upload SDD files and Phylogeny Files (Nexus / Newick / New-Hampshire Extended / NeXML / PhyloXML - pick the ones you think are relevant) to the scratchpad. We might need to specify the file extensions which these files should have

SDD = .xml Nexus = .nex Phylip= .phy New-Hampshire Extended = .nhx NeXML = .xml PhyloXML = .xml

These files (if published) should be exported in the images.txt file in the Darwin core archive (see https://docs.google.com/spreadsheet/ccc?key=0AsnL4wtLP8y0dGhzWkRmTmdabDdOamlwelpYS3VaTEE&usp=sharing). For the import to work we require the following fields: dc:identifier= the uri of the file in the scratchpad dc:format=the mime-type of the file

e.g. SDD = application/xml Nexus =text/plain Phylip= text/plain New-Hampshire Extended = text/plain NeXML = application/xml PhyloXML = application/xml

It would also be nice to have other fields dwc:taxonID – the checklist ID of the e.g. root taxon for the key or phylogeny dc:references – the node in the scratchpad, same as identifier dc:creator – the creator of the file dc:description – longer description dc:title – short title dc:subject – keywords

but these are not essential

informatics-dev commented 11 years ago

Comment by Simon Rycroft

I have added the required file extensions.

informatics-dev commented 11 years ago

Comment by Alice Heaton

How would users upload the files ? Would they upload them as images or as 'other' file types ?

If they are uploaded as 'other' file types, how would we make the difference between the type of XML files we do want to include in the export, and the type of XML files we do not want to include in the export ?

informatics-dev commented 11 years ago

Comment by Alice Heaton

Adding an option "include in DwC-A archive" to all files would clutter the interface for a feature that will not be used often.

An alternative would be to use a naming convention - so that XML files that should be included in DwC-A archives are named "myfile.dwca.xml".

informatics-dev commented 11 years ago

Comment by Sarah Phillips

Hi Alice, Not sure if you want feedback from us or the rest of the scratchpad team. We are happy for all xml files to be included in the archive, however if you want to provide an alternative way of excluding files such as this naming convention then that's fine. Note that most phylogenies are not in XML format but we could adopt a similar convention e.g. *.dwca.nex or .dwca.nwk. For the portal it does not matter if other XML files are included in the export as (a) by default the harvester will skip all non image files anyway. If the phylogeny and identification key harvesting is enabled then the harvester will try to detect the content type by downloading the xml file and examining it by looking at the xml namespace of the root element

informatics-dev commented 11 years ago

Comment by Alice Heaton

Thanks for your feedback. After discussions here we're also happy to include all XML files. I will do this today.

informatics-dev commented 11 years ago

Comment by Alice Heaton

Which Dublin Core Metadata Initiative type should be used for those files ? Options are Collection , Dataset , Event , Image , InteractiveResource , MovingImage , PhysicalObject , Service , Software , Sound , StillImage , Text.

I will assume Dataset is the correct one - let me know if not.

informatics-dev commented 11 years ago

Comment by Alice Heaton

I have done this and created a branch for it 2295-export-keyfiles-dwca for testing.

Is there a particular site you would like to test this on ?

Note that at this stage the files don't have a license (only images have a license field). If you need those files to have a license this should be opened as a separate issue (once a license field is added to other file types, these would get automatically included in the dwca export)

informatics-dev commented 11 years ago

Comment by Alice Heaton

I've asked Ed to review the changes (to ensure this does not break compatibility with other users of the DwC-A) and assigning this support team for testing.

informatics-dev commented 11 years ago

Comment by Laurence Livermore

As this was an eMonocot feature request I have asked Serene which site(s) already contain these files for the purposes of testing.

informatics-dev commented 11 years ago

Comment by Serene Hargreaves

http://lomandroideae.e-monocot.org/ has an .xml (SDD) file http://lomandroideae.e-monocot.org/sites/lomandroideae.e-monocot.org/files/LomandroideaeGeneraForScratchpad.xml

http://families.e-monocot.org/ has a .nex (nexus) file http://families.e-monocot.org/sites/families.e-monocot.org/files/Monocot_Genera.nex

informatics-dev commented 11 years ago

Comment by Laurence Livermore

Alice, I cannot find the branch "2295-export-keyfiles-dwca" for testing in Aegir. Can you let me know when it's available?

informatics-dev commented 11 years ago

Comment by Simon Rycroft

I have created the platform.

informatics-dev commented 10 years ago

Comment by Laurence Livermore

I attempted to clone http://lomandroideae.e-monocot.org/ to the platform but was unable to (option was not selectable).

I verified the platform but still could not clone the site to platform "2295-export-keyfiles-dwca"

informatics-dev commented 10 years ago

Comment by Simon Rycroft

Can you merge the master branch with the 2295 branch please Alice.

informatics-dev commented 10 years ago

Comment by Alice Heaton

I've merged the branch (there were some conflicts that required manual resolution), update the code on Quartz and verified the platform - so it should now be ready to test (as long as your test site name starts with "dev." or "dev-" ; otherwise wait a bit for the code on Silica to sync).

I did not re-test the fix however ; I don't think there is a conflict with the other code that was added to the export.

informatics-dev commented 10 years ago

Comment by Laurence Livermore

eMonocot sites cloned to branch Redmine issue 2295

http://dev-lomandroideae.taxon.name/ http://dev-families.taxon.name/

informatics-dev commented 10 years ago

Comment by Simon Rycroft

The DwC-A file has been rebuilt. Please check it as soon as is possible.

informatics-dev commented 10 years ago

Comment by Laurence Livermore

Works as expected:

Links to the .xml and .nex files are present in the "images.txt" file in the DwCA.zip from both sites.

informatics-dev commented 10 years ago

Comment by Simon Rycroft

Branch has been merged in to master, and will be included in the next release.