OpenTreeOfLife / opentree

Opentree browsing and curation web site. For overarching or cross-repo concerns, please see the 'germinator' repo.
http://tree.opentreeoflife.org/
BSD 2-Clause "Simplified" License
111 stars 26 forks source link

Curator newick tip labels #514

Open josephwb opened 9 years ago

josephwb commented 9 years ago

A study may have all taxa mapped to the taxonomy, but the newick from the curator has tip labels from the original file.

For example, the newick for study ot_211 is:

(((Symphyle:0.568537,Glomeris:0.460981):0.06919,(((Lepeopht:0.912254,'DAPHNIA ':0.549582):0.065099,(Speleone:0.464261,((Acerento:0.57184,(((Tetrodon:0.287743,'Anurida ':0.378074):0.080217,(Pogonogn:0.238871,Folsomia:0.383107):0.052091):0.039124,Sminthur:0.31149):0.434428):0.053307,((Campodea:0.493337,Occasjap:0.322626):0.101766,((Meinerte:0.179927,Machilis:0.170968):0.177278,(((Thermobi:0.178027,'Atelura ':0.204904):0.025437,Trichole:0.187946):0.050517,((('Baetis p':0.301549,((Euryloph:0.121,Ephemera:0.139545):0.041681,Isonychi:0.126829):0.057444):0.214859,((Corduleg:0.072497,Epiophle:0.054396):0.031975,Calopter:0.119125):0.247966):0.034797,((((Forficul:0.181582,Apachyus:0.143241):0.264999,Zorotypu:0.34234):0.029352,((('Perla ma':0.184353,Cosmiope:0.191505):0.023196,'Leuctra ':0.196142):0.099631,((((Metallyt:0.119994,('Empusa p':0.042587,'Mantis r':0.043588):0.03352):0.111187,(((Cryptoce:0.064992,(Mastoter:0.088811,(Prorhino:0.088732,ZOOTERMO:0.069085):0.007359):0.011533):0.020489,Periplan:0.088823):0.018148,Blaberus:0.145283):0.0346):0.052622,(((Haploemb:0.072304,Aposthon:0.061362):0.230244,((Peruphas:0.083459,'Aretaon ':0.070504):0.146067,'Timema c':0.200933):0.041519):0.045011,(Tanzanio:0.195818,(Galloisi:0.046916,Gryllobl:0.065003):0.145927):0.035716):0.02133):0.015154,(('Tetrix s':0.184767,(Prosarth:0.139318,Stenobot:0.119167):0.077786):0.08259,(Gryllota:0.258168,Ceuthoph:0.152565):0.031669):0.02996):0.015902):0.022723):0.02954,(((Gynaikot:0.327608,(Franklin:0.095394,'Thrips p':0.099975):0.179184):0.156542,(((Trialeur:0.115202,'Bemisia ':0.120042):0.30678,(((Essigell:0.079111,(ACYRTOSI:0.02318,'Aphis go':0.063978):0.054857):0.488401,Planococ:0.468437):0.066219,Acanthoc:0.533639):0.033164):0.059885,(((Acanthos:0.291746,Notostir:0.308472):0.077316,('Ranatra ':0.259408,'Velia ca':0.384397):0.036093):0.144271,(Xenophys:0.334551,((Cercopis:0.184555,Okanagan:0.169085):0.065817,Nilaparv:0.332343):0.021145):0.024215):0.035362):0.065842):0.02764,(((Tenthred:0.140433,('Orussus ':0.17036,(('Cotesia ':0.258038,(Leptopil:0.197718,'NASONIA ':0.197835):0.026536):0.015116,(((Exoneura:0.137889,('APIS MEL':0.04941,'Bombus t':0.051013):0.02096):0.067017,(ACROMYRM:0.09047,Harpegna:0.087591):0.063031):0.023893,'Chrysis ':0.136751):0.036358):0.014408):0.037364):0.226106,((((((Dendroct:0.346725,('Meloe vi':0.183453,TRIBOLIU:0.124854):0.058778):0.058185,Aleochar:0.329651):0.111623,((('Gyrinus ':0.202057,'Carabus ':0.187903):0.058419,'Priacma ':0.269338):0.020455,Lepiceru:0.3841):0.025135):0.025374,(Mengenil:0.429229,'Stylops ':0.414515):0.492404):0.039703,(((Conwentz:0.509811,('Osmylus ':0.182393,(Dichochr:0.208568,Euroleon:0.138564):0.06395):0.037957):0.025617,(Corydalu:0.18098,'Sialis l':0.278779):0.043068):0.026864,(Inocelli:0.12105,Xanthost:0.102844):0.139726):0.087298):0.054996,((((ANOPHELE:0.178418,'Aedes AE':0.123988):0.195578,(('Bibio ma':0.329412,(Bombyliu:0.200367,(DROSOPHI:0.219201,('Lipara l':0.178729,(Rhagolet:0.205698,(Glossina:0.188522,(Sarcopha:0.095418,Triarthr:0.084332):0.044006):0.047335):0.013814):0.02522):0.159561):0.090898):0.031641,((Trichoce:0.220904,'Tipula m':0.299583):0.040387,Phleboto:0.35488):0.017375):0.032462):0.209067,((Ceratoph:0.101562,(Archaeop:0.083638,Ctenocep:0.052222):0.090332):0.172571,((Nannocho:0.234942,(Bittacus:0.152291,'Panorpa ':0.138756):0.063648):0.020935,'Boreus h':0.208929):0.029395):0.105079):0.06079,(((Eriocran:0.253955,('Triodia ':0.201409,(Nemophor:0.199228,(Yponomeu:0.140003,('Zygaena ':0.166237,((Polyomma:0.150456,'Parides ':0.117619):0.021214,('BOMBYX M':0.128039,'Manduca ':0.118582):0.031152):0.018642):0.026248):0.108564):0.083038):0.029851):0.076576,Micropte:0.51463):0.051551,((Rhyacoph:0.201101,Platycen:0.190053):0.049171,(Hydropti:0.268521,(Philopot:0.231323,Cheumato:0.174586):0.072536):0.025424):0.100369):0.174219):0.055865):0.057356):0.048373,(Ectopsoc:0.392275,(('Menopon ':0.180633,PEDICULU:0.231111):0.037605,Liposcel:0.16484):0.086963):0.200856):0.019791):0.028051):0.04606):0.028051):0.059207):0.066755):0.032421):0.022576):0.023328):0.040441,(Cypridin:0.663119,(Sarsineb:0.520149,(Litopena:0.222553,'Celuca p':0.226605):0.139701):0.243247):0.067686):0.09751):0.338501,'IXODES S':0.338501);

I think people are only going to want mapped labels, right?

jar398 commented 9 years ago

Yes. We are advertising mapped-to-taxonomy as a key value added of the system. I guess it could be an option, but that makes for UI clutter.

jar398 commented 9 years ago

If they want the original labels they're in the nexus file. But on the other hand, what should the labels be in the Newick file? We'll have to put something there.

kcranston commented 9 years ago

I think I agree with @josephwb here. The labels in the newick file should be the mapped labels. If you are downloading the tree from opentree, that tree should have the curated data (where applicable, depending on file type). We do keep the original uploaded / pasted file, which would have the original labels. Or label could be an option when downloading the newick.

jar398 commented 9 years ago

Sorry, I wasn't clear. For tips that don't map to OTT, what should the labels be in the Newick file?

mtholder commented 9 years ago

I need to document this better but there is an optional argument tip_label to the GET in phylesystem. The 3 currently supported values for that are: 'ot:originalLabel', 'ot:ottId', 'ot:ottTaxonName' (the values are not case sensitive). So, you can contrast: https://devapi.opentreeoflife.org/phylesystem/v1/study/pg_2510/tree/tree5405?format=newick&tip_label=ot:originalLabel https://devapi.opentreeoflife.org/phylesystem/v1/study/pg_2510/tree/tree5405?format=newick&tip_label=ot:ottID https://devapi.opentreeoflife.org/phylesystem/v1/study/pg_2510/tree/tree5405?format=newick&tip_label=ot:ottTaxonName

mtholder commented 9 years ago

and in answer to @jar398 's point: the mapped tips show up as things like: '_unlabeled tip #1','_unlabeled tip #2', etc. I'm happy to change that behavior if anyone has a better suggestion.

jar398 commented 9 years ago

You mean the unmapped tips I presume. Sorry to quibble - it's what I do - but 'unlabeled tip #1' is wrong in two ways:

  1. it's a label, therefore it is a lie.
  2. the tip is labeled in the original file - I read this as saying that there was no label in the original source, which is also not true.

Maybe we could use a label like 'original label - Rana pipiens' or 'not mapped to OTT - Rana pipiens' ? Just a thought.

mtholder commented 9 years ago

It would also be nice to see the OTT IDs tree browser - perhaps that belongs in a separate issue...

josephwb commented 8 years ago

Similar: https://github.com/OpenTreeOfLife/opentree/issues/743