Closed bedroesb closed 5 years ago
@bedroesb nice one, you beat to it! I was about to push the changes. relates to https://github.com/MIAPPE/ISA-Tab-for-plant-phenotyping/issues/17 @PapoutsoglouE
Well it was a small thingy so sorry about that ;) I don't really see a problem with the if block to be honest.
I know that I need to change the block when taxonIDs are given through BrAPI.
@bedroesb if the if block, in order to be consistent, I think we need to make sure we use a similar pattern: so: if 'taxonId' in all_germplasm_attributes and all_germplasm_attributes['taxonId']: taxonids =[] organism = "multiple organisms" ncbitaxon = OntologySource(name='NCBITaxon', description="NCBI Taxonomy") for taxonid in all_germplasm_attributes['taxonId']: taxonids.append(att_test(taxonid, 'sourceName', 'NCBI') + ":" + str(taxonid['taxonId'])) c = self.create_isa_characteristic('Organism', organism, ';'.join(taxonids),ncbitaxon.name,';'.join(taxonids)) returned_characteristics.append(c)
sorry didn't test
The attribute taxonId looks like this:
"taxonIds": [
{
"sourceName": "ncbiTaxon",
"taxonId": "2340"
},
{
"sourceName": "ciradTaxon",
"taxonId": "E312"
}
],
So the problem is how to handle the URI when it is not a NCBI taxon.
If I assume it is always NCBI taxon ID, than it is an easy thing to implement indeed
I guess we can just look for a sourceName == ncbiTaxon, and than take the one that is delivered by taxonId, otherwise use the implementation (using the genus and species)
right but I can't remember now of top of my head if that situation (multiple taxonIds) occurs when there is one species+genus and the multiple taxonIds refer to a listing of 'alternate identifiers' for the same organism or if it corresponds to defined a hybrid organism where it is necessary to list all the different taxons from the parents lines.
either way, concatenation resulting from the multiple entries will not be necessarily pretty in a tabular format.
true that, I am changing it
@proccaserra I've made a new function to make things more logic.
I will add some documentation to it
WUR endpoint delivered the URI link as taxonId, while the Portuguese one gave the NCBI ID itself, but this is handled in the script now.
Take another look at the crosslinked issue on the MIAPPE side. I am not sure that this is the best option, so let's still consider some alternatives!
So you propose an extra column called Characteristics[NCBI] with the NCBI id ? Not a problem at all to implement
VIB:
Source Name | Characteristics[NCBI] | Term Source REF | Term Accession Number | Characteristics[Organism] | Characteristics[Genus] | Characteristics[Species] | Protocol REF | Sample Name | Characteristics[Observation Unit Type] | Characteristics[Spatial Distribution] | Factor Value[water regimen]0 | Factor Value[water regimen]1 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
OE-2-1 | Arabidopsis thaliana | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/3702 | NCBI:3702 | Arabidopsis | thaliana | Growth | pot_13 | plant | [plant]13 | jobau_wellwatered_3-9DAS | jobau_wellwatered_3-9DAS |
OE-2-1 | Arabidopsis thaliana | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/3702 | NCBI:3702 | Arabidopsis | thaliana | Growth | pot_27 | plant | [plant]27 | jobau_wellwatered_3-9DAS | jobau_wellwatered_3-9DAS |
OE-2-1 | Arabidopsis thaliana | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/3702 | NCBI:3702 | Arabidopsis | thaliana | Growth | pot_24 | plant | [plant]24 | jobau_wellwatered_3-9DAS | jobau_wellwatered_3-9DAS |
OE-2-1 | Arabidopsis thaliana | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/3702 | NCBI:3702 | Arabidopsis | thaliana | Growth | pot_3 | plant | [plant]3 | jobau_wellwatered_3-9DAS | jobau_wellwatered_3-9DAS |
OE-2-1 | Arabidopsis thaliana | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/3702 | NCBI:3702 | Arabidopsis | thaliana | Growth | pot_17 | plant | [plant]17 | jobau_wellwatered_3-9DAS | jobau_wellwatered_3-9DAS |
PT:
Source Name | Characteristics[NCBI] | Term Source REF | Term Accession Number | Characteristics[Organism] | Characteristics[Genus] | Characteristics[Species] | Characteristics[Material Source ID] | Protocol REF | Sample Name | Characteristics[Observation Unit Type] | Characteristics[Spatial Distribution] |
---|---|---|---|---|---|---|---|---|---|---|---|
Cork oak Barradas daSerra 03 | Quercus suber | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/58331 | NCBI:58331 | Quercus | suber | INIAV:BS03 | Growth | BS3 | plantnumber | [block]1; [plot]1; [plant]BS3; [replicate]1 |
Corkoak Barradas da Serra 04 | Quercus suber | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/58331 | NCBI:58331 | Quercus | suber | INIAV:BS04 | Growth | BS4 | plantnumber | [block]1; [plot]1; [plant]BS4; [replicate]2 |
Corkoak Barradas da Serra 05 | Quercus suber | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/58331 | NCBI:58331 | Quercus | suber | INIAV:BS05 | Growth | BS5 | plantnumber | [block]1; [plot]1; [plant]BS5; [replicate]3 |
Corkoak Barradas da Serra 06 | Quercus suber | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/58331 | NCBI:58331 | Quercus | suber | INIAV:BS06 | Growth | BS6 | plantnumber | [block]1; [plot]1; [plant]BS6; [replicate]4 |
Corkoak Barradas da Serra 07 | Quercus suber | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/58331 | NCBI:58331 | Quercus | suber | INIAV:BS07 | Growth | BS7 | plantnumber | [block]1; [plot]1; [plant]BS7; [replicate]5 |
WUR
Source Name | Characteristics[NCBI] | Term Source REF | Term Accession Number | Characteristics[Organism] | Characteristics[Genus] | Characteristics[Species] | Characteristics[Material Source ID] | Characteristics[Material Source DOI] | Protocol REF | Sample Name | Characteristics[Observation Unit Type] | Characteristics[Spatial Distribution] | Factor Value[fruit load] |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S. lycopersicum cv. M82 | Solanum lycopersicum | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/4081 | NCBI:4081 | Solanum | lycopersicum | EA10004 | https://www.eu-sol.wur.nl/rdf/accession/EA10004 | Growth | 29302110 | plant | [X]2110; [plot]0; [plant]29302110; [replicate]1 | low (pruned till one fruit) |
S. lycopersicum cv. M82 | Solanum lycopersicum | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/4081 | NCBI:4081 | Solanum | lycopersicum | EA10004 | https://www.eu-sol.wur.nl/rdf/accession/EA10004 | Growth | 29301054 | plant | [X]1054; [plot]0; [plant]29301054; [replicate]1 | low (pruned till one fruit) |
S. lycopersicum cv. M82 | Solanum lycopersicum | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/4081 | NCBI:4081 | Solanum | lycopersicum | EA10004 | https://www.eu-sol.wur.nl/rdf/accession/EA10004 | Growth | 29301824 | plant | [X]1824; [plot]0; [plant]29301824; [replicate]1 | low (pruned till one fruit) |
S. lycopersicum cv. M82 | Solanum lycopersicum | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/4081 | NCBI:4081 | Solanum | lycopersicum | EA10004 | https://www.eu-sol.wur.nl/rdf/accession/EA10004 | Growth | 29302127 | plant | [X]2127; [plot]0; [plant]29302127; [replicate]1 | low (pruned till one fruit) |
S. lycopersicum cv. M82 | Solanum lycopersicum | NCBITaxon | http://purl.bioontology.org/ontology/NCBITAXON/4081 | NCBI:4081 | Solanum | lycopersicum | EA10004 | https://www.eu-sol.wur.nl/rdf/accession/EA10004 | Growth | 29301317 | plant | [X]1317; [plot]0; [plant]29301317; [replicate]1 | low (pruned till one fruit) |
@PapoutsoglouE
Please check my post on the related issue on the MIAPPE github: https://github.com/MIAPPE/ISA-Tab-for-plant-phenotyping/issues/17#issuecomment-524937373
If the goal is for BrAPI2ISA to generate MIAPPE-compliant ISA-Tab, then what I said there holds here as well. We should not be modeling Organism in a way that differs from the MIAPPE 1.1 checklist, even if that means we cannot use some of the functionalities from ISA.
@DanFaria
So if I am following correctly, it will stay the same as it was (so without the
Characteristics[NCBI] | Term Source REF | Term Accession Number |
---|
columns)
But with NCBITAXON:xxxx instead of NCBI:xxxx, for the Characteristics[Organism] column.
@bedroesb Yes, I think that is the best solution, as I don't see a way to improve functionality on the ISA side without deviating from the MIAPPE checklist. I would give it a couple of days to see if anyone expresses a different opinion on the pending MIAPPE ISA-Tab issue, but after that, I think you can go ahead with that configuration.
Eliana has already posted an issue on the MIAPPE checklist to update the NCBI prefix to NCBITAXON, and hopefully that can be done still within the MIAPPE 1.1 release, as it is a non-functional change.
@bedroesb @DanFaria I guess the ambiguity lies in the fact that for MIAPPE organism
, an identifier is expected, where intuitively an organism name
would be supplied (following the pattern for Genus
and Species
.
so may be a minor change would be to use 'organism ID' in both MIAPPE and the ISA configuration to remove that uncertainty.
so may be a minor change would be to use 'organism ID' in both MIAPPE and the ISA configuration to remove that uncertainty.
I agree that this would make the field more intuitive. I'll raise the issue on the MIAPPE checklist, and if approved, we can update the ISA configuration.
WUR:
Source Name | Characteristics[Organism] | Characteristics[Genus] | Characteristics[Species] | Characteristics[Material Source ID] | Characteristics[Material Source DOI] | Protocol REF | Sample Name | Characteristics[Observation Unit Type] | Characteristics[Spatial Distribution] | Factor Value[fruit load] |
---|---|---|---|---|---|---|---|---|---|---|
S. lycopersicum cv. M82 | NCBITAXON:4081 | Solanum | lycopersicum | EA10004 | https://www.eu-sol.wur.nl/rdf/accession/EA10004 | Growth | 29301824 | plant | X:1824;plot:0;plant:29301824;replicate:1 | low (pruned till one fruit) |
S. lycopersicum cv. M82 | NCBITAXON:4081 | Solanum | lycopersicum | EA10004 | https://www.eu-sol.wur.nl/rdf/accession/EA10004 | Growth | 29301642 | plant | X:1642;plot:0;plant:29301642;replicate:1 | low (pruned till one fruit) |
Pt:
Source Name | Characteristics[Organism] | Characteristics[Genus] | Characteristics[Species] | Characteristics[Material Source ID] | Protocol REF | Sample Name | Characteristics[Observation Unit Type] | Characteristics[Spatial Distribution] |
---|---|---|---|---|---|---|---|---|
Cork oak Barradas daSerra 03 | NCBITAXON:58331 | Quercus | suber | INIAV:BS03 | Growth | BS3 | plantnumber | block:1;plot:1;plant:BS3;replicate:1 |
Corkoak Barradas da Serra 04 | NCBITAXON:58331 | Quercus | suber | INIAV:BS04 | Growth | BS4 | plantnumber | block:1;plot:1;plant:BS4;replicate:2 |
VIB:
Source Name | Characteristics[Organism] | Characteristics[Genus] | Characteristics[Species] | Protocol REF | Sample Name | Characteristics[Observation Unit Type] | Characteristics[Spatial Distribution] | Factor Value[water regimen] |
---|---|---|---|---|---|---|---|---|
OE-2-1 | NCBITAXON:3702 | Arabidopsis | thaliana | Growth | pot_10 | plant | plant:10 | jobau_wellwatered_10-21DAS,jobau_wellwatered_3-9DAS |
OE-2-1 | NCBITAXON:3702 | Arabidopsis | thaliana | Growth | pot_24 | plant | plant:24 | jobau_drought_10-21DAS,jobau_wellwatered_3-9DAS |
Of which the VIB one has the solved treatments problem mentioned before
Off the top of my head, I don't recall any of the WUR germplasm having S. lycopersicum
in their name/ID. I am also unsure where the cv. M82
came from.
@bedroesb, could you elaborate on how the Source Name is formed in this case?
(I may be misremembering and there might indeed be germplasm with that information)
(Also, the format for Spatial Distribution has been changed from using square brackets to colons, i.e. from [block] 1;[plot] 2
to block:1;plot:2
.)
I double checked, and indeed our database has some entries with that germplasm name. Apologies!
No problem! I just updated the examples in my previous post with the latest code changes concerning Characteristics[Spatial Distribution]
New format:
generated with: