cpauvert / mi-atlas

Interactive and evolving catalogue of microbial interactions based on the framework of Pacheco and Segrè (2019)
GNU General Public License v3.0
0 stars 0 forks source link

Include NCBI TaxID alongside the participants names #16

Open cpauvert opened 3 years ago

cpauvert commented 3 years ago

This seems feasible if relying on the great taxize R package from ropensci. The following example:

library(taxize) # might need an API key for entrez
get_uid("Zygotorulaspora florentina", rank_query="species")
# Provides 48255 OR NA if wrong name
cpauvert commented 3 years ago

With the issue #18 in mind, I started some time ago to work on this issue and few problems arose.

The search for NCBI Tax ID failed for some participants names (ex: row 1 in table below), probably because of trailing sp. or spp. (which seems easy to tackle).

Moreover some participants names are fuzzy (ex: row 5 or 10) which has two consequences:

  1. They will need renaming but in a tractable way
  2. The taxonomic resolution might also change (row 1 P1 will become only at the genus resolution instead of species)

I am still not sure how to deal with these problems nor exactly how to properly sanitize the names and tax id without hand corrections.

                Participant_1                   Participant_2     TR1     TR2
1           Acanthamoeba spp.          Candidatus Procabacter species   genus
2       Acetobacterium woodii         Pelobacter acidigallici species species
3               Acinetobacter              Pseudomonas putida   genus species
4       Alteromonas macleodii                 Prochlorococcus species   genus
5  Ammonia-oxidizing bacteria      Nitrite-oxidizing bacteria   class   class
6            Archaea (ANME-2)              Desulfosarcina sp.  phylum   genus
7        Aspergillus nidulans      Streptomyces rapamycinicus species species
8             Azotobacter sp.                  Alternaria sp. species species
9                Bacillus sp.            Debaryomyces vanriji species species
10         Bacteroides ovatus Bacteroides vulgatus and others species species