QUICC-FOR / QUICCSQL

Merging several forest sample plots databases into one final PostgreSQL database
1 stars 0 forks source link

Species names #1

Open amael-ls opened 8 years ago

amael-ls commented 8 years ago

In file final_ref_table.csv, 2 names for Pinus banksiana (cf lines 1150 and 2276):

ltalluto commented 8 years ago

I ran into this with pin ban and a few other spp. Looks like some records just didn't get TSNs associated with them. It's probably safe to just update the NAs to the correct species, no?

MiraBryant commented 8 years ago

I agree, updating is the best idea.

Miranda

Sent from my iPad

On Sep 12, 2016, at 1:30 PM, Matthew Talluto notifications@github.com wrote:

I ran into this with pin ban and a few other spp. Looks like some records just didn't get TSNs associated with them. It's probably safe to just update the NAs to the correct species, no?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

amael-ls commented 8 years ago

The potential problematic species: "NA-ACA-ANE" "NA-CAR-CAR" "NA-CAR-OVA" "NA-CHA-NA" "NA-CYA-NA" "NA-EUG-PAL" "NA-EUG-STE" "NA-HED-NA" "NA-LIQ-STY" "NA-MAL-NA" "NA-MOR-NA" "NA-PAR-NA" "NA-PIN-BAN" "NA-PLA-NA" "NA-PRI-LAN" "NA-PSY-MAR" "NA-PTE-MAC" "NA-QUE-MAR" "NA-QUE-PRI"

Here is the small function I used to detect them (it is a quick and dirty solution, sorry listProblem.R.zip

)

amael-ls commented 8 years ago

Ok, after comparing latin names, I found that only two species are the same:

  1. PIN-BAN
  2. LIQ-STY

Therefore they can be merged

ltalluto commented 8 years ago

I think there are a few issues going on here. For Pin ban and Liq Sty, there are TSNs for some records and not for others, so the NA records need to be updated to point to the right species key. For others, TSNs (and in some cases, specific epithets) are missing entirely.

For the missing epithets (records ending in -NA), we should verify from the raw data if possible that these records were only genus level observations.

For others, we should add TSNs when they are available. If the species is not listed in ITIS, we should check for synonyms and use the TSN for the synonym.

amael-ls commented 8 years ago

nbColumns.c++.zip

Another issue (which might not be one...): There are some semicolon in the english name of some species. Therefore read.table (and friends) from R cannot read them because the separator is also semicolon. Here is a C++ function that detect where there are some problems. On the file "final_ref_table.csv", I found 78 problems (run the function to have the lines). Example line 11: 18032;"Abies";"balsamea";"Balsam fir ;balsam fir";"Sapin baumier";"SAB";20;"Bf";12;5;"18032-ABI-BAL"

ltalluto commented 8 years ago

read.table handles this fine on my machine. The quotes protect the extra semicolon. Depending on your version/localization of R, you may have to set sep=";", quote='"'

SteveViss commented 8 years ago

For the missing epithets (records ending in -NA), we should verify from the raw data if possible that these records were only genus level observations. From @mtalluto

Yes, you're right. This is the decision we took. Those species have only a genus.

As you suggested, I have to update the first NA value in species code string for the right TSN (when it's possible). We still have too keep in mind than on ~2500 total species in the ref_speciestable only ~200 species are present in the QUICC-FOR database.

amael-ls commented 8 years ago

It seems that some species have synonyms, maybe this is why you could not find TSN code. Example: NA-CAR-ALB; Carya alba; Carya tomentosa NA-CHA-NOO; Chamaecyparis nootkatensis; Cupressus nootkatensis (changed in 1993) NA-QUE-PRI; Quercus prinus L.; Quercus montana NA-TAX-ASC; Taxodium ascendens; Taxodium distichum var. imbricarium (or var. nutans??)

cf ITIS website: http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=183433

ltalluto commented 8 years ago

Yes, this is exactly it. I don't have access to the database from here (I think?), so I can't make the change. You'll have to buy Steve a beer and he can do it :)