Fireandplants / plant_gbif

This repository is for data and scripts related to plant species distribution across the globe using the Global Biodiversity Information Facility (GBIF) dataset.
4 stars 2 forks source link

GBIF name matching results and checking synonymy #4

Closed dmcglinn closed 10 years ago

dmcglinn commented 10 years ago

We recently received a list of name matches from GBIF. Of the 88,135 names we submitted 83,269 had at least one record in GBIF. If we just look at the names on the Tank tree of the 31,749 names on the tree 30,903 had matches so 846 names did not yield a match.

Although we're doing really good at getting a lot of matches. If someone has the energy / time it would be great if they made a script to see if there are known synonyms for the names that did not yield any GBIF records that are not already in our query. Then we could send this larger list back to Jan and have it carry out the full query. Also if we want to check if the GBIF database at least contains a name ourselves the package rGBIF can do this very quickly.

@AmyZanne has suggested that when the original names list was put together synonym were included but I have not had a chance to look at this carefully.

One thing a did do to get us started down this path was to bounce all the 83,269 names we submitted to GBIF off TNRS which returned a lot of information on if a name is accepted, by what organization the name is accepted by, and if there are known synonyms. This information is stored in taxa_for_big_phylo_tnrs.csv

ejforrestel commented 10 years ago

I think this is where getting the updated Plant List would be useful -- we can then match all species to the list and compile a synonymy list -- much like what was done already (but with the older version of the plant list). I have never used TNRS before to match names...it uses some of the same sources as the Plant List.

On Thu, Mar 20, 2014 at 10:47 AM, Dan McGlinn notifications@github.comwrote:

We recently received a list of name matches from GBIF. Of the 88,135 names we submitted 83,269 had at least one record in GBIF. If we just look at the names on the Tank tree of the 31,749 names on the tree 30,903 had matches so 846 names did not yield a match.

Although we're doing really good at getting a lot of matches. If someone has the energy / time it would be great if they made a script to see if there are known synonyms for the names that did not yield any GBIF records that are not already in our query. Then we could send this larger list back to Jan and have it carry out the full query. Also if we want to check if the GBIF database at least contains a name ourselves the package rGBIF can do this very quickly.

@AmyZanne https://github.com/AmyZanne has suggested that when the original names list was put together synonym were included but I have not had a chance to look at this carefully.

One thing a did do to get us started down this path was to bounce all the 83,269 names we submitted to GBIF off TNRS which returned a lot of information on if a name is accepted, by what organization the name is accepted by, and if there are known synonyms. This information is stored in taxa_for_big_phylo_tnrs.csvhttps://github.com/Fireandplants/plant_gbif/blob/0091a1ffe211ffe82c59c59416227d2cbae7b9fe/query_names/taxa_for_big_phylo_tnrs.csv

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4 .

dschwilk commented 10 years ago

Hi guys,

Sorry if I am being overly pedantic/obvious, but to the synonym steps as I see them are

1) Whether we use an old or new version of the plant list, we do need to have a lookup table from canonical name (Tank Tree taxon label) to multiple possible synonyms. Then each synonym plus each canonical name is added to the GBIF query list.

2) We obtain GBIF results for each name in the query and now merge all results by synonym to canonical name using a reverse lookup on the above table. Now our results match the Tank tree (and any existing trait data, eg Zanne et al, that is matched to those names)

So, I assume that the list that Dan just submitted resulted from a step 1, above? Using a table created from an older version of the Plant List? Where is that table? I have not seen it. We should do the merge (step 2) using the same table. If this doesn't exist, then doesn't that necessitate making a new synonym table from some source and then repeating steps 1 and 2?

It would necessitate a new GBIF query, but if the Plant List folks get back to Beth or I, then we /could/ create a new table, if needed. I think Amy's email suggested that they won't get back to us because it the data is not really "shared" readily. But I can write a scraper I suppose. It depends how their server responds to automated requests. And then there is the work of turning that into a lookup table. I will also look at the TNRS data Dan queried as that could be an alternative synonym source. Amy was concerned about using a newer table, but it should not be a problem if we do step 1 right before GBIF query and step 2 right after. That way the canonical names are the Tank tree ones. Now, this does not help us match synonyms against other trait data, but we could always do that split/merge again for matching against trait databases not already lined up with Tank tree taxon labels.

Sorry that is so wordy. I'm leaving for the field (Guadalupe Mountains) early tomorrow morning, so I'll be out of touch for a few days.

-Dylan

On 03/20/2014 12:56 PM, ejforrestel wrote:

I think this is where getting the updated Plant List would be useful -- we can then match all species to the list and compile a synonymy list -- much like what was done already (but with the older version of the plant list). I have never used TNRS before to match names...it uses some of the same sources as the Plant List.

On Thu, Mar 20, 2014 at 10:47 AM, Dan McGlinn notifications@github.comwrote:

We recently received a list of name matches from GBIF. Of the 88,135 names we submitted 83,269 had at least one record in GBIF. If we just look at the names on the Tank tree of the 31,749 names on the tree 30,903 had matches so 846 names did not yield a match.

Although we're doing really good at getting a lot of matches. If someone has the energy / time it would be great if they made a script to see if there are known synonyms for the names that did not yield any GBIF records that are not already in our query. Then we could send this larger list back to Jan and have it carry out the full query. Also if we want to check if the GBIF database at least contains a name ourselves the package rGBIF can do this very quickly.

@AmyZanne https://github.com/AmyZanne has suggested that when the original names list was put together synonym were included but I have not had a chance to look at this carefully.

One thing a did do to get us started down this path was to bounce all the 83,269 names we submitted to GBIF off TNRS which returned a lot of information on if a name is accepted, by what organization the name is accepted by, and if there are known synonyms. This information is stored in

taxa_for_big_phylo_tnrs.csvhttps://github.com/Fireandplants/plant_gbif/blob/0091a1ffe211ffe82c59c59416227d2cbae7b9fe/query_names/taxa_for_big_phylo_tnrs.csv

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4 .

— Reply to this email directly or view it on GitHub https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38200006.

AmyZanne commented 10 years ago

I have 3 different avenues for name scrubbing.

  1. Taxize in R allows scrubbing and you can use the Plant List. It uses the current version. I think this is pretty slow.
  2. From tempo and mode WG, I have from Will C cleaning scripts. I have a script for handling oddities in names (weird spaces, characters, etc.) and then he used fuzzy matching to go after spelling errors/gender changes, etc. I don't have the latter.
  3. More recently, Will Pearse wrote script for a Sesync meeting and is willing to share. I have to check with him about which is the right code/plant list file. It is the old plant list of which I have a copy (~54 MB). Let me check with him and will get back to you.

Amy

On Thu, Mar 20, 2014 at 2:38 PM, Dylan Schwilk notifications@github.comwrote:

Hi guys,

Sorry if I am being overly pedantic/obvious, but to the synonym steps as I see them are

1) Whether we use an old or new version of the plant list, we do need to have a lookup table from canonical name (Tank Tree taxon label) to multiple possible synonyms. Then each synonym plus each canonical name is added to the GBIF query list.

2) We obtain GBIF results for each name in the query and now merge all results by synonym to canonical name using a reverse lookup on the above table. Now our results match the Tank tree (and any existing trait data, eg Zanne et al, that is matched to those names)

So, I assume that the list that Dan just submitted resulted from a step 1, above? Using a table created from an older version of the Plant List? Where is that table? I have not seen it. We should do the merge (step 2) using the same table. If this doesn't exist, then doesn't that necessitate making a new synonym table from some source and then repeating steps 1 and 2?

It would necessitate a new GBIF query, but if the Plant List folks get back to Beth or I, then we /could/ create a new table, if needed. I think Amy's email suggested that they won't get back to us because it the data is not really "shared" readily. But I can write a scraper I suppose. It depends how their server responds to automated requests. And then there is the work of turning that into a lookup table. I will also look at the TNRS data Dan queried as that could be an alternative synonym source. Amy was concerned about using a newer table, but it should not be a problem if we do step 1 right before GBIF query and step 2 right after. That way the canonical names are the Tank tree ones. Now, this does not help us match synonyms against other trait data, but we could always do that split/merge again for matching against trait databases not already lined up with Tank tree taxon labels.

Sorry that is so wordy. I'm leaving for the field (Guadalupe Mountains) early tomorrow morning, so I'll be out of touch for a few days.

-Dylan

On 03/20/2014 12:56 PM, ejforrestel wrote:

I think this is where getting the updated Plant List would be useful -- we can then match all species to the list and compile a synonymy list -- much like what was done already (but with the older version of the plant list). I have never used TNRS before to match names...it uses some of the same sources as the Plant List.

On Thu, Mar 20, 2014 at 10:47 AM, Dan McGlinn <notifications@github.com wrote:

We recently received a list of name matches from GBIF. Of the 88,135 names we submitted 83,269 had at least one record in GBIF. If we just look at the names on the Tank tree of the 31,749 names on the tree 30,903 had matches so 846 names did not yield a match.

Although we're doing really good at getting a lot of matches. If someone has the energy / time it would be great if they made a script to see if there are known synonyms for the names that did not yield any GBIF records that are not already in our query. Then we could send this larger list back to Jan and have it carry out the full query. Also if we want to check if the GBIF database at least contains a name ourselves the package rGBIF can do this very quickly.

@AmyZanne https://github.com/AmyZanne has suggested that when the

original names list was put together synonym were included but I have not had a chance to look at this carefully.

One thing a did do to get us started down this path was to bounce all the 83,269 names we submitted to GBIF off TNRS which returned a lot of information on if a name is accepted, by what organization the name is accepted by, and if there are known synonyms. This information is stored in

taxa_for_big_phylo_tnrs.csv< https://github.com/Fireandplants/plant_gbif/blob/0091a1ffe211ffe82c59c59416227d2cbae7b9fe/query_names/taxa_for_big_phylo_tnrs.csv

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4

.

Reply to this email directly or view it on GitHub < https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38200006 .

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38205010 .

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/
ejforrestel commented 10 years ago

I just figured out how to scrape the plant list by family (so it will include all species and their synonyms for Angiosperms, Gymnosperms, Ferns and Bryophytes) so I can put up a version of the Plant List v 1.1 to use with whatever scripts in the next day or so -- but if we already have it let me know and I won't continue.

On Thu, Mar 20, 2014 at 12:42 PM, AmyZanne notifications@github.com wrote:

I have 3 different avenues for name scrubbing.

  1. Taxize in R allows scrubbing and you can use the Plant List. It uses the current version. I think this is pretty slow.
  2. From tempo and mode WG, I have from Will C cleaning scripts. I have a script for handling oddities in names (weird spaces, characters, etc.) and then he used fuzzy matching to go after spelling errors/gender changes, etc. I don't have the latter.
  3. More recently, Will Pearse wrote script for a Sesync meeting and is willing to share. I have to check with him about which is the right code/plant list file. It is the old plant list of which I have a copy (~54 MB). Let me check with him and will get back to you.

Amy

On Thu, Mar 20, 2014 at 2:38 PM, Dylan Schwilk <notifications@github.com

wrote:

Hi guys,

Sorry if I am being overly pedantic/obvious, but to the synonym steps as I see them are

1) Whether we use an old or new version of the plant list, we do need to have a lookup table from canonical name (Tank Tree taxon label) to multiple possible synonyms. Then each synonym plus each canonical name is added to the GBIF query list.

2) We obtain GBIF results for each name in the query and now merge all results by synonym to canonical name using a reverse lookup on the above table. Now our results match the Tank tree (and any existing trait data, eg Zanne et al, that is matched to those names)

So, I assume that the list that Dan just submitted resulted from a step 1, above? Using a table created from an older version of the Plant List? Where is that table? I have not seen it. We should do the merge (step 2) using the same table. If this doesn't exist, then doesn't that necessitate making a new synonym table from some source and then repeating steps 1 and 2?

It would necessitate a new GBIF query, but if the Plant List folks get back to Beth or I, then we /could/ create a new table, if needed. I think Amy's email suggested that they won't get back to us because it the data is not really "shared" readily. But I can write a scraper I suppose. It depends how their server responds to automated requests. And then there is the work of turning that into a lookup table. I will also look at the TNRS data Dan queried as that could be an alternative synonym source. Amy was concerned about using a newer table, but it should not be a problem if we do step 1 right before GBIF query and step 2 right after. That way the canonical names are the Tank tree ones. Now, this does not help us match synonyms against other trait data, but we could always do that split/merge again for matching against trait databases not already lined up with Tank tree taxon labels.

Sorry that is so wordy. I'm leaving for the field (Guadalupe Mountains) early tomorrow morning, so I'll be out of touch for a few days.

-Dylan

On 03/20/2014 12:56 PM, ejforrestel wrote:

I think this is where getting the updated Plant List would be useful -- we can then match all species to the list and compile a synonymy list -- much like what was done already (but with the older version of the plant list). I have never used TNRS before to match names...it uses some of the same sources as the Plant List.

On Thu, Mar 20, 2014 at 10:47 AM, Dan McGlinn < notifications@github.com wrote:

We recently received a list of name matches from GBIF. Of the 88,135 names we submitted 83,269 had at least one record in GBIF. If we just look at the names on the Tank tree of the 31,749 names on the tree 30,903 had matches so 846 names did not yield a match.

Although we're doing really good at getting a lot of matches. If someone has the energy / time it would be great if they made a script to see if there are known synonyms for the names that did not yield any GBIF records that are not already in our query. Then we could send this larger list back to Jan and have it carry out the full query. Also if we want to check if the GBIF database at least contains a name ourselves the package rGBIF can do this very quickly.

@AmyZanne https://github.com/AmyZanne has suggested that when the

original names list was put together synonym were included but I have not had a chance to look at this carefully.

One thing a did do to get us started down this path was to bounce all the 83,269 names we submitted to GBIF off TNRS which returned a lot of information on if a name is accepted, by what organization the name is accepted by, and if there are known synonyms. This information is stored in

taxa_for_big_phylo_tnrs.csv<

https://github.com/Fireandplants/plant_gbif/blob/0091a1ffe211ffe82c59c59416227d2cbae7b9fe/query_names/taxa_for_big_phylo_tnrs.csv

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4

.

Reply to this email directly or view it on GitHub <

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38200006

.

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38205010

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212140 .

dmcglinn commented 10 years ago

Hey Beth,

I think the rOpenSci package taxize also provide the Plant list at the family level.

https://github.com/ropensci/taxize/

On Thu, Mar 20, 2014 at 1:47 PM, ejforrestel notifications@github.comwrote:

I just figured out how to scrape the plant list by family (so it will include all species and their synonyms for Angiosperms, Gymnosperms, Ferns and Bryophytes) so I can put up a version of the Plant List v 1.1 to use with whatever scripts in the next day or so -- but if we already have it let me know and I won't continue.

On Thu, Mar 20, 2014 at 12:42 PM, AmyZanne notifications@github.com wrote:

I have 3 different avenues for name scrubbing.

  1. Taxize in R allows scrubbing and you can use the Plant List. It uses the current version. I think this is pretty slow.
  2. From tempo and mode WG, I have from Will C cleaning scripts. I have a script for handling oddities in names (weird spaces, characters, etc.) and then he used fuzzy matching to go after spelling errors/gender changes, etc. I don't have the latter.
  3. More recently, Will Pearse wrote script for a Sesync meeting and is willing to share. I have to check with him about which is the right code/plant list file. It is the old plant list of which I have a copy (~54 MB). Let me check with him and will get back to you.

Amy

On Thu, Mar 20, 2014 at 2:38 PM, Dylan Schwilk <notifications@github.com

wrote:

Hi guys,

Sorry if I am being overly pedantic/obvious, but to the synonym steps as I see them are

1) Whether we use an old or new version of the plant list, we do need to have a lookup table from canonical name (Tank Tree taxon label) to multiple possible synonyms. Then each synonym plus each canonical name is added to the GBIF query list.

2) We obtain GBIF results for each name in the query and now merge all results by synonym to canonical name using a reverse lookup on the above table. Now our results match the Tank tree (and any existing trait data, eg Zanne et al, that is matched to those names)

So, I assume that the list that Dan just submitted resulted from a step 1, above? Using a table created from an older version of the Plant List? Where is that table? I have not seen it. We should do the merge (step 2) using the same table. If this doesn't exist, then doesn't that necessitate making a new synonym table from some source and then repeating steps 1 and 2?

It would necessitate a new GBIF query, but if the Plant List folks get back to Beth or I, then we /could/ create a new table, if needed. I think Amy's email suggested that they won't get back to us because it the data is not really "shared" readily. But I can write a scraper I suppose. It depends how their server responds to automated requests. And then there is the work of turning that into a lookup table. I will also look at the TNRS data Dan queried as that could be an alternative synonym source. Amy was concerned about using a newer table, but it should not be a problem if we do step 1 right before GBIF query and step 2 right after. That way the canonical names are the Tank tree ones. Now, this does not help us match synonyms against other trait data, but we could always do that split/merge again for matching against trait databases not already lined up with Tank tree taxon labels.

Sorry that is so wordy. I'm leaving for the field (Guadalupe Mountains) early tomorrow morning, so I'll be out of touch for a few days.

-Dylan

On 03/20/2014 12:56 PM, ejforrestel wrote:

I think this is where getting the updated Plant List would be useful

we can then match all species to the list and compile a synonymy list -- much like what was done already (but with the older version of the plant list). I have never used TNRS before to match names...it uses some of the same sources as the Plant List.

On Thu, Mar 20, 2014 at 10:47 AM, Dan McGlinn < notifications@github.com wrote:

We recently received a list of name matches from GBIF. Of the 88,135 names we submitted 83,269 had at least one record in GBIF. If we just look at the names on the Tank tree of the 31,749 names on the tree 30,903 had matches so 846 names did not yield a match.

Although we're doing really good at getting a lot of matches. If someone has the energy / time it would be great if they made a script to see if there are known synonyms for the names that did not yield any GBIF records that are not already in our query. Then we could send this larger list back to Jan and have it carry out the full query. Also if we want to check if the GBIF database at least contains a name ourselves the package rGBIF can do this very quickly.

@AmyZanne https://github.com/AmyZanne has suggested that when the

original names list was put together synonym were included but I have not had a chance to look at this carefully.

One thing a did do to get us started down this path was to bounce all the 83,269 names we submitted to GBIF off TNRS which returned a lot of information on if a name is accepted, by what organization the name is accepted by, and if there are known synonyms. This information is stored in

taxa_for_big_phylo_tnrs.csv<

https://github.com/Fireandplants/plant_gbif/blob/0091a1ffe211ffe82c59c59416227d2cbae7b9fe/query_names/taxa_for_big_phylo_tnrs.csv

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4

.

Reply to this email directly or view it on GitHub <

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38200006

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38205010

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212140

.

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212668 .

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

ejforrestel commented 10 years ago

Hey Dan~

They do, but taxize is pretty slow, so it will take a while to get the complete list of species. I am just downloading all the species by family using -wget and it is pretty fast and we will have the complete list of species on the Plant List this way.

On Thu, Mar 20, 2014 at 12:49 PM, Dan McGlinn notifications@github.comwrote:

Hey Beth,

I think the rOpenSci package taxize also provide the Plant list at the family level.

https://github.com/ropensci/taxize/

On Thu, Mar 20, 2014 at 1:47 PM, ejforrestel <notifications@github.com

wrote:

I just figured out how to scrape the plant list by family (so it will include all species and their synonyms for Angiosperms, Gymnosperms, Ferns and Bryophytes) so I can put up a version of the Plant List v 1.1 to use with whatever scripts in the next day or so -- but if we already have it let me know and I won't continue.

On Thu, Mar 20, 2014 at 12:42 PM, AmyZanne notifications@github.com wrote:

I have 3 different avenues for name scrubbing.

  1. Taxize in R allows scrubbing and you can use the Plant List. It uses the current version. I think this is pretty slow.
  2. From tempo and mode WG, I have from Will C cleaning scripts. I have a script for handling oddities in names (weird spaces, characters, etc.) and then he used fuzzy matching to go after spelling errors/gender changes, etc. I don't have the latter.
  3. More recently, Will Pearse wrote script for a Sesync meeting and is willing to share. I have to check with him about which is the right code/plant list file. It is the old plant list of which I have a copy (~54 MB). Let me check with him and will get back to you.

Amy

On Thu, Mar 20, 2014 at 2:38 PM, Dylan Schwilk < notifications@github.com

wrote:

Hi guys,

Sorry if I am being overly pedantic/obvious, but to the synonym steps as I see them are

1) Whether we use an old or new version of the plant list, we do need to have a lookup table from canonical name (Tank Tree taxon label) to multiple possible synonyms. Then each synonym plus each canonical name is added to the GBIF query list.

2) We obtain GBIF results for each name in the query and now merge all results by synonym to canonical name using a reverse lookup on the above table. Now our results match the Tank tree (and any existing trait data, eg Zanne et al, that is matched to those names)

So, I assume that the list that Dan just submitted resulted from a step 1, above? Using a table created from an older version of the Plant List? Where is that table? I have not seen it. We should do the merge (step 2) using the same table. If this doesn't exist, then doesn't that necessitate making a new synonym table from some source and then repeating steps 1 and 2?

It would necessitate a new GBIF query, but if the Plant List folks get back to Beth or I, then we /could/ create a new table, if needed. I think Amy's email suggested that they won't get back to us because it the data is not really "shared" readily. But I can write a scraper I suppose. It depends how their server responds to automated requests. And then there is the work of turning that into a lookup table. I will also look at the TNRS data Dan queried as that could be an alternative synonym source. Amy was concerned about using a newer table, but it should not be a problem if we do step 1 right before GBIF query and step 2 right after. That way the canonical names are the Tank tree ones. Now, this does not help us match synonyms against other trait data, but we could always do that split/merge again for matching against trait databases not already lined up with Tank tree taxon labels.

Sorry that is so wordy. I'm leaving for the field (Guadalupe Mountains) early tomorrow morning, so I'll be out of touch for a few days.

-Dylan

On 03/20/2014 12:56 PM, ejforrestel wrote:

I think this is where getting the updated Plant List would be

useful

we

can then match all species to the list and compile a synonymy list

much like what was done already (but with the older version of the plant list). I have never used TNRS before to match names...it uses some of the same sources as the Plant List.

On Thu, Mar 20, 2014 at 10:47 AM, Dan McGlinn < notifications@github.com wrote:

We recently received a list of name matches from GBIF. Of the 88,135 names we submitted 83,269 had at least one record in GBIF. If we just look at the names on the Tank tree of the 31,749 names on the tree 30,903 had matches so 846 names did not yield a match.

Although we're doing really good at getting a lot of matches. If someone has the energy / time it would be great if they made a script to see if there are known synonyms for the names that did not yield any GBIF records that are not already in our query. Then we could send this larger list back to Jan and have it carry out the full query. Also if we want to check if the GBIF database at least contains a name ourselves the package rGBIF can do this very quickly.

@AmyZanne https://github.com/AmyZanne has suggested that when the

original names list was put together synonym were included but I have not had a chance to look at this carefully.

One thing a did do to get us started down this path was to bounce all the 83,269 names we submitted to GBIF off TNRS which returned a lot of information on if a name is accepted, by what organization the name is accepted by, and if there are known synonyms. This information is stored in

taxa_for_big_phylo_tnrs.csv<

https://github.com/Fireandplants/plant_gbif/blob/0091a1ffe211ffe82c59c59416227d2cbae7b9fe/query_names/taxa_for_big_phylo_tnrs.csv

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4

.

Reply to this email directly or view it on GitHub <

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38200006

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38205010

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212140

.

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212668

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212891 .

AmyZanne commented 10 years ago

Sounds awesome.

On Thu, Mar 20, 2014 at 3:55 PM, ejforrestel notifications@github.comwrote:

Hey Dan~

They do, but taxize is pretty slow, so it will take a while to get the complete list of species. I am just downloading all the species by family using -wget and it is pretty fast and we will have the complete list of species on the Plant List this way.

On Thu, Mar 20, 2014 at 12:49 PM, Dan McGlinn <notifications@github.com

wrote:

Hey Beth,

I think the rOpenSci package taxize also provide the Plant list at the family level.

https://github.com/ropensci/taxize/

On Thu, Mar 20, 2014 at 1:47 PM, ejforrestel <notifications@github.com

wrote:

I just figured out how to scrape the plant list by family (so it will include all species and their synonyms for Angiosperms, Gymnosperms, Ferns and Bryophytes) so I can put up a version of the Plant List v 1.1 to use with whatever scripts in the next day or so -- but if we already have it let me know and I won't continue.

On Thu, Mar 20, 2014 at 12:42 PM, AmyZanne notifications@github.com wrote:

I have 3 different avenues for name scrubbing.

  1. Taxize in R allows scrubbing and you can use the Plant List. It uses the current version. I think this is pretty slow.
  2. From tempo and mode WG, I have from Will C cleaning scripts. I have a script for handling oddities in names (weird spaces, characters, etc.) and then he used fuzzy matching to go after spelling errors/gender changes, etc. I don't have the latter.
  3. More recently, Will Pearse wrote script for a Sesync meeting and is willing to share. I have to check with him about which is the right code/plant list file. It is the old plant list of which I have a copy (~54 MB). Let me check with him and will get back to you.

Amy

On Thu, Mar 20, 2014 at 2:38 PM, Dylan Schwilk < notifications@github.com

wrote:

Hi guys,

Sorry if I am being overly pedantic/obvious, but to the synonym steps as I see them are

1) Whether we use an old or new version of the plant list, we do need to have a lookup table from canonical name (Tank Tree taxon label) to multiple possible synonyms. Then each synonym plus each canonical name is added to the GBIF query list.

2) We obtain GBIF results for each name in the query and now merge all results by synonym to canonical name using a reverse lookup on the above table. Now our results match the Tank tree (and any existing trait data, eg Zanne et al, that is matched to those names)

So, I assume that the list that Dan just submitted resulted from a step 1, above? Using a table created from an older version of the Plant List? Where is that table? I have not seen it. We should do the merge (step 2) using the same table. If this doesn't exist, then doesn't that necessitate making a new synonym table from some source and then repeating steps 1 and 2?

It would necessitate a new GBIF query, but if the Plant List folks get back to Beth or I, then we /could/ create a new table, if needed. I think Amy's email suggested that they won't get back to us because it the data is not really "shared" readily. But I can write a scraper I suppose. It depends how their server responds to automated requests. And then there is the work of turning that into a lookup table. I will also look at the TNRS data Dan queried as that could be an alternative synonym source. Amy was concerned about using a newer table, but it should not be a problem if we do step 1 right before GBIF query and step 2 right after. That way the canonical names are the Tank tree ones. Now, this does not help us match synonyms against other trait data, but we could always do that split/merge again for matching against trait databases not already lined up with Tank tree taxon labels.

Sorry that is so wordy. I'm leaving for the field (Guadalupe Mountains) early tomorrow morning, so I'll be out of touch for a few days.

-Dylan

On 03/20/2014 12:56 PM, ejforrestel wrote:

I think this is where getting the updated Plant List would be

useful

we can then match all species to the list and compile a synonymy

list

much like what was done already (but with the older version of the plant list). I have never used TNRS before to match names...it uses some of the same sources as the Plant List.

On Thu, Mar 20, 2014 at 10:47 AM, Dan McGlinn < notifications@github.com wrote:

We recently received a list of name matches from GBIF. Of the 88,135 names we submitted 83,269 had at least one record in GBIF. If we just look at the names on the Tank tree of the 31,749 names on the tree 30,903 had matches so 846 names did not yield a match.

Although we're doing really good at getting a lot of matches. If someone has the energy / time it would be great if they made a script to see if there are known synonyms for the names that did not yield any GBIF records that are not already in our query. Then we could send this larger list back to Jan and have it carry out the full query. Also if we want to check if the GBIF database at least contains a name ourselves the package rGBIF can do this very quickly.

@AmyZanne https://github.com/AmyZanne has suggested that when

the

original names list was put together synonym were included but I have not had a chance to look at this carefully.

One thing a did do to get us started down this path was to bounce all the 83,269 names we submitted to GBIF off TNRS which returned a lot of information on if a name is accepted, by what organization the name is accepted by, and if there are known synonyms. This information is stored in

taxa_for_big_phylo_tnrs.csv<

https://github.com/Fireandplants/plant_gbif/blob/0091a1ffe211ffe82c59c59416227d2cbae7b9fe/query_names/taxa_for_big_phylo_tnrs.csv

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4

.

Reply to this email directly or view it on GitHub <

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38200006

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38205010

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212140

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212668

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212891

.

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38213460 .

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/
dmcglinn commented 10 years ago

Oh sweet! Sounds great. You rock!

On Thu, Mar 20, 2014 at 1:55 PM, ejforrestel notifications@github.comwrote:

Hey Dan~

They do, but taxize is pretty slow, so it will take a while to get the complete list of species. I am just downloading all the species by family using -wget and it is pretty fast and we will have the complete list of species on the Plant List this way.

On Thu, Mar 20, 2014 at 12:49 PM, Dan McGlinn <notifications@github.com

wrote:

Hey Beth,

I think the rOpenSci package taxize also provide the Plant list at the family level.

https://github.com/ropensci/taxize/

On Thu, Mar 20, 2014 at 1:47 PM, ejforrestel <notifications@github.com

wrote:

I just figured out how to scrape the plant list by family (so it will include all species and their synonyms for Angiosperms, Gymnosperms, Ferns and Bryophytes) so I can put up a version of the Plant List v 1.1 to use with whatever scripts in the next day or so -- but if we already have it let me know and I won't continue.

On Thu, Mar 20, 2014 at 12:42 PM, AmyZanne notifications@github.com wrote:

I have 3 different avenues for name scrubbing.

  1. Taxize in R allows scrubbing and you can use the Plant List. It uses the current version. I think this is pretty slow.
  2. From tempo and mode WG, I have from Will C cleaning scripts. I have a script for handling oddities in names (weird spaces, characters, etc.) and then he used fuzzy matching to go after spelling errors/gender changes, etc. I don't have the latter.
  3. More recently, Will Pearse wrote script for a Sesync meeting and is willing to share. I have to check with him about which is the right code/plant list file. It is the old plant list of which I have a copy (~54 MB). Let me check with him and will get back to you.

Amy

On Thu, Mar 20, 2014 at 2:38 PM, Dylan Schwilk < notifications@github.com

wrote:

Hi guys,

Sorry if I am being overly pedantic/obvious, but to the synonym steps as I see them are

1) Whether we use an old or new version of the plant list, we do need to have a lookup table from canonical name (Tank Tree taxon label) to multiple possible synonyms. Then each synonym plus each canonical name is added to the GBIF query list.

2) We obtain GBIF results for each name in the query and now merge all results by synonym to canonical name using a reverse lookup on the above table. Now our results match the Tank tree (and any existing trait data, eg Zanne et al, that is matched to those names)

So, I assume that the list that Dan just submitted resulted from a step 1, above? Using a table created from an older version of the Plant List? Where is that table? I have not seen it. We should do the merge (step 2) using the same table. If this doesn't exist, then doesn't that necessitate making a new synonym table from some source and then repeating steps 1 and 2?

It would necessitate a new GBIF query, but if the Plant List folks get back to Beth or I, then we /could/ create a new table, if needed. I think Amy's email suggested that they won't get back to us because it the data is not really "shared" readily. But I can write a scraper I suppose. It depends how their server responds to automated requests. And then there is the work of turning that into a lookup table. I will also look at the TNRS data Dan queried as that could be an alternative synonym source. Amy was concerned about using a newer table, but it should not be a problem if we do step 1 right before GBIF query and step 2 right after. That way the canonical names are the Tank tree ones. Now, this does not help us match synonyms against other trait data, but we could always do that split/merge again for matching against trait databases not already lined up with Tank tree taxon labels.

Sorry that is so wordy. I'm leaving for the field (Guadalupe Mountains) early tomorrow morning, so I'll be out of touch for a few days.

-Dylan

On 03/20/2014 12:56 PM, ejforrestel wrote:

I think this is where getting the updated Plant List would be

useful

we can then match all species to the list and compile a synonymy

list

much like what was done already (but with the older version of the plant list). I have never used TNRS before to match names...it uses some of the same sources as the Plant List.

On Thu, Mar 20, 2014 at 10:47 AM, Dan McGlinn < notifications@github.com wrote:

We recently received a list of name matches from GBIF. Of the 88,135 names we submitted 83,269 had at least one record in GBIF. If we just look at the names on the Tank tree of the 31,749 names on the tree 30,903 had matches so 846 names did not yield a match.

Although we're doing really good at getting a lot of matches. If someone has the energy / time it would be great if they made a script to see if there are known synonyms for the names that did not yield any GBIF records that are not already in our query. Then we could send this larger list back to Jan and have it carry out the full query. Also if we want to check if the GBIF database at least contains a name ourselves the package rGBIF can do this very quickly.

@AmyZanne https://github.com/AmyZanne has suggested that when the

original names list was put together synonym were included but I have not had a chance to look at this carefully.

One thing a did do to get us started down this path was to bounce all the 83,269 names we submitted to GBIF off TNRS which returned a lot of information on if a name is accepted, by what organization the name is accepted by, and if there are known synonyms. This information is stored in

taxa_for_big_phylo_tnrs.csv<

https://github.com/Fireandplants/plant_gbif/blob/0091a1ffe211ffe82c59c59416227d2cbae7b9fe/query_names/taxa_for_big_phylo_tnrs.csv

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4

.

Reply to this email directly or view it on GitHub <

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38200006

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38205010

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212140

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212668

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212891

.

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38213460 .

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

dschwilk commented 10 years ago

Great Beth!

I can work on code to turn that into a synonym table when I get back from the field.

Thanks Amy --- I was avoiding those other issues and just worrying about the synonyms as a separate problem from name mispellings, character encodings, etc. Hm, those other issues are kind of a hard problem --- at least when going in one direction. Not so bad if one has the complete access to the data on both ends, but it is hard to anticipate those things for creating a query (eg GBIF). For matching against trait databases when we have the database itself those, then we can use what you have that you can share or we can re-implement.

-Dylan

On 03/20/2014 02:47 PM, ejforrestel wrote:

I just figured out how to scrape the plant list by family (so it will include all species and their synonyms for Angiosperms, Gymnosperms, Ferns and Bryophytes) so I can put up a version of the Plant List v 1.1 to use with whatever scripts in the next day or so -- but if we already have it let me know and I won't continue.

On Thu, Mar 20, 2014 at 12:42 PM, AmyZanne notifications@github.com wrote:

I have 3 different avenues for name scrubbing.

  1. Taxize in R allows scrubbing and you can use the Plant List. It uses the current version. I think this is pretty slow.
  2. From tempo and mode WG, I have from Will C cleaning scripts. I have a script for handling oddities in names (weird spaces, characters, etc.) and then he used fuzzy matching to go after spelling errors/gender changes, etc. I don't have the latter.
  3. More recently, Will Pearse wrote script for a Sesync meeting and is willing to share. I have to check with him about which is the right code/plant list file. It is the old plant list of which I have a copy (~54 MB). Let me check with him and will get back to you.

Amy

On Thu, Mar 20, 2014 at 2:38 PM, Dylan Schwilk <notifications@github.com

wrote:

Hi guys,

Sorry if I am being overly pedantic/obvious, but to the synonym steps as I see them are

1) Whether we use an old or new version of the plant list, we do need to have a lookup table from canonical name (Tank Tree taxon label) to multiple possible synonyms. Then each synonym plus each canonical name is added to the GBIF query list.

2) We obtain GBIF results for each name in the query and now merge all results by synonym to canonical name using a reverse lookup on the above table. Now our results match the Tank tree (and any existing trait data, eg Zanne et al, that is matched to those names)

So, I assume that the list that Dan just submitted resulted from a step 1, above? Using a table created from an older version of the Plant List? Where is that table? I have not seen it. We should do the merge (step 2) using the same table. If this doesn't exist, then doesn't that necessitate making a new synonym table from some source and then repeating steps 1 and 2?

It would necessitate a new GBIF query, but if the Plant List folks get back to Beth or I, then we /could/ create a new table, if needed. I think Amy's email suggested that they won't get back to us because it the data is not really "shared" readily. But I can write a scraper I suppose. It depends how their server responds to automated requests. And then there is the work of turning that into a lookup table. I will also look at the TNRS data Dan queried as that could be an alternative synonym source. Amy was concerned about using a newer table, but it should not be a problem if we do step 1 right before GBIF query and step 2 right after. That way the canonical names are the Tank tree ones. Now, this does not help us match synonyms against other trait data, but we could always do that split/merge again for matching against trait databases not already lined up with Tank tree taxon labels.

Sorry that is so wordy. I'm leaving for the field (Guadalupe Mountains) early tomorrow morning, so I'll be out of touch for a few days.

-Dylan

On 03/20/2014 12:56 PM, ejforrestel wrote:

I think this is where getting the updated Plant List would be useful -- we can then match all species to the list and compile a synonymy list -- much like what was done already (but with the older version of the plant list). I have never used TNRS before to match names...it uses some of the same sources as the Plant List.

On Thu, Mar 20, 2014 at 10:47 AM, Dan McGlinn < notifications@github.com wrote:

We recently received a list of name matches from GBIF. Of the 88,135 names we submitted 83,269 had at least one record in GBIF. If we just look at the names on the Tank tree of the 31,749 names on the tree 30,903 had matches so 846 names did not yield a match.

Although we're doing really good at getting a lot of matches. If someone has the energy / time it would be great if they made a script to see if there are known synonyms for the names that did not yield any GBIF records that are not already in our query. Then we could send this larger list back to Jan and have it carry out the full query. Also if we want to check if the GBIF database at least contains a name ourselves the package rGBIF can do this very quickly.

@AmyZanne https://github.com/AmyZanne has suggested that when the

original names list was put together synonym were included but I have not had a chance to look at this carefully.

One thing a did do to get us started down this path was to bounce all the 83,269 names we submitted to GBIF off TNRS which returned a lot of information on if a name is accepted, by what organization the name is accepted by, and if there are known synonyms. This information is stored in

taxa_for_big_phylo_tnrs.csv<

https://github.com/Fireandplants/plant_gbif/blob/0091a1ffe211ffe82c59c59416227d2cbae7b9fe/query_names/taxa_for_big_phylo_tnrs.csv

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4

.

Reply to this email directly or view it on GitHub <

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38200006

.

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38205010

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212140 .

— Reply to this email directly or view it on GitHub https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212668.

AmyZanne commented 10 years ago

Alright, pretty sure this is the code I was using from Will P to go with the old plant list names. It uses taxize with a local version of the plant list.

On Thu, Mar 20, 2014 at 3:59 PM, Dan McGlinn notifications@github.comwrote:

Oh sweet! Sounds great. You rock!

On Thu, Mar 20, 2014 at 1:55 PM, ejforrestel <notifications@github.com

wrote:

Hey Dan~

They do, but taxize is pretty slow, so it will take a while to get the complete list of species. I am just downloading all the species by family using -wget and it is pretty fast and we will have the complete list of species on the Plant List this way.

On Thu, Mar 20, 2014 at 12:49 PM, Dan McGlinn <notifications@github.com

wrote:

Hey Beth,

I think the rOpenSci package taxize also provide the Plant list at the family level.

https://github.com/ropensci/taxize/

On Thu, Mar 20, 2014 at 1:47 PM, ejforrestel <notifications@github.com

wrote:

I just figured out how to scrape the plant list by family (so it will include all species and their synonyms for Angiosperms, Gymnosperms, Ferns and Bryophytes) so I can put up a version of the Plant List v 1.1 to use with whatever scripts in the next day or so -- but if we already have it let me know and I won't continue.

On Thu, Mar 20, 2014 at 12:42 PM, AmyZanne <notifications@github.com

wrote:

I have 3 different avenues for name scrubbing.

  1. Taxize in R allows scrubbing and you can use the Plant List. It uses the current version. I think this is pretty slow.
  2. From tempo and mode WG, I have from Will C cleaning scripts. I have a script for handling oddities in names (weird spaces, characters, etc.) and then he used fuzzy matching to go after spelling errors/gender changes, etc. I don't have the latter.
  3. More recently, Will Pearse wrote script for a Sesync meeting and is willing to share. I have to check with him about which is the right code/plant list file. It is the old plant list of which I have a copy (~54 MB). Let me check with him and will get back to you.

Amy

On Thu, Mar 20, 2014 at 2:38 PM, Dylan Schwilk < notifications@github.com

wrote:

Hi guys,

Sorry if I am being overly pedantic/obvious, but to the synonym steps as I see them are

1) Whether we use an old or new version of the plant list, we do need to have a lookup table from canonical name (Tank Tree taxon label) to multiple possible synonyms. Then each synonym plus each canonical name is added to the GBIF query list.

2) We obtain GBIF results for each name in the query and now merge all results by synonym to canonical name using a reverse lookup on the above table. Now our results match the Tank tree (and any existing trait data, eg Zanne et al, that is matched to those names)

So, I assume that the list that Dan just submitted resulted from a step 1, above? Using a table created from an older version of the Plant List? Where is that table? I have not seen it. We should do the merge (step 2) using the same table. If this doesn't exist, then doesn't that necessitate making a new synonym table from some source and then repeating steps 1 and 2?

It would necessitate a new GBIF query, but if the Plant List folks get back to Beth or I, then we /could/ create a new table, if needed. I think Amy's email suggested that they won't get back to us because it the data is not really "shared" readily. But I can write a scraper I suppose. It depends how their server responds to automated requests. And then there is the work of turning that into a lookup table. I will also look at the TNRS data Dan queried as that could be an alternative synonym source. Amy was concerned about using a newer table, but it should not be a problem if we do step 1 right before GBIF query and step 2 right after. That way the canonical names are the Tank tree ones. Now, this does not help us match synonyms against other trait data, but we could always do that split/merge again for matching against trait databases not already lined up with Tank tree taxon labels.

Sorry that is so wordy. I'm leaving for the field (Guadalupe Mountains) early tomorrow morning, so I'll be out of touch for a few days.

-Dylan

On 03/20/2014 12:56 PM, ejforrestel wrote:

I think this is where getting the updated Plant List would be

useful

we can then match all species to the list and compile a synonymy

list

much like what was done already (but with the older version of the plant list). I have never used TNRS before to match names...it uses some of the same sources as the Plant List.

On Thu, Mar 20, 2014 at 10:47 AM, Dan McGlinn < notifications@github.com wrote:

We recently received a list of name matches from GBIF. Of the 88,135 names we submitted 83,269 had at least one record in GBIF. If we just look at the names on the Tank tree of the 31,749 names on the tree 30,903 had matches so 846 names did not yield a match.

Although we're doing really good at getting a lot of matches. If someone has the energy / time it would be great if they made a script to see if there are known synonyms for the names that did not yield any GBIF records that are not already in our query. Then we could send this larger list back to Jan and have it carry out the full query. Also if we want to check if the GBIF database at least contains a name ourselves the package rGBIF can do this very quickly.

@AmyZanne https://github.com/AmyZanne has suggested that when

the

original names list was put together synonym were included but I have not had a chance to look at this carefully.

One thing a did do to get us started down this path was to bounce all the 83,269 names we submitted to GBIF off TNRS which returned a lot of information on if a name is accepted, by what organization the name is accepted by, and if there are known synonyms. This information is stored in

taxa_for_big_phylo_tnrs.csv<

https://github.com/Fireandplants/plant_gbif/blob/0091a1ffe211ffe82c59c59416227d2cbae7b9fe/query_names/taxa_for_big_phylo_tnrs.csv

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4

.

Reply to this email directly or view it on GitHub <

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38200006

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38205010

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212140

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212668

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212891

.

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38213460

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38213898 .

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/
AmyZanne commented 10 years ago

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

Also there's a discrete number of problems to check before it's no longer worth the time to catch. Will P does a little of that looking for weird things between genus and species and then removes anything that comes in as a sp. of some sort. For that group we debated a lot how to deal with hybrids. Since it's about economic plants we kept them. For tempo and mode we also kept them. It might be a discussion to have about what to do with hybrids.

For what it's worth here is Will C and Ginger Jui's scrubbing script and what Will C has to say about it "here is the scrubbing script, which handles weird names. Spelling and synonymy are in a different script but that one is a bit trickier to work with...."

When you are dealing with 80K+ taxa, losing a few thousand doesn't turn out to be horrible.

On Thu, Mar 20, 2014 at 4:04 PM, Dylan Schwilk notifications@github.comwrote:

Great Beth!

I can work on code to turn that into a synonym table when I get back from the field.

Thanks Amy --- I was avoiding those other issues and just worrying about the synonyms as a separate problem from name mispellings, character encodings, etc. Hm, those other issues are kind of a hard problem --- at least when going in one direction. Not so bad if one has the complete access to the data on both ends, but it is hard to anticipate those things for creating a query (eg GBIF). For matching against trait databases when we have the database itself those, then we can use what you have that you can share or we can re-implement.

-Dylan

On 03/20/2014 02:47 PM, ejforrestel wrote:

I just figured out how to scrape the plant list by family (so it will include all species and their synonyms for Angiosperms, Gymnosperms, Ferns and Bryophytes) so I can put up a version of the Plant List v 1.1 to use with whatever scripts in the next day or so -- but if we already have it let me know and I won't continue.

On Thu, Mar 20, 2014 at 12:42 PM, AmyZanne notifications@github.com wrote:

I have 3 different avenues for name scrubbing.

  1. Taxize in R allows scrubbing and you can use the Plant List. It uses the current version. I think this is pretty slow.
  2. From tempo and mode WG, I have from Will C cleaning scripts. I have a script for handling oddities in names (weird spaces, characters, etc.) and then he used fuzzy matching to go after spelling errors/gender changes, etc. I don't have the latter.
  3. More recently, Will Pearse wrote script for a Sesync meeting and is willing to share. I have to check with him about which is the right code/plant list file. It is the old plant list of which I have a copy (~54 MB). Let me check with him and will get back to you.

Amy

On Thu, Mar 20, 2014 at 2:38 PM, Dylan Schwilk < notifications@github.com

wrote:

Hi guys,

Sorry if I am being overly pedantic/obvious, but to the synonym steps as I see them are

1) Whether we use an old or new version of the plant list, we do need to have a lookup table from canonical name (Tank Tree taxon label) to multiple possible synonyms. Then each synonym plus each canonical name is added to the GBIF query list.

2) We obtain GBIF results for each name in the query and now merge all results by synonym to canonical name using a reverse lookup on the above table. Now our results match the Tank tree (and any existing trait data, eg Zanne et al, that is matched to those names)

So, I assume that the list that Dan just submitted resulted from a step 1, above? Using a table created from an older version of the Plant List? Where is that table? I have not seen it. We should do the merge (step 2) using the same table. If this doesn't exist, then doesn't that necessitate making a new synonym table from some source and then repeating steps 1 and 2?

It would necessitate a new GBIF query, but if the Plant List folks get back to Beth or I, then we /could/ create a new table, if needed. I think Amy's email suggested that they won't get back to us because it the data is not really "shared" readily. But I can write a scraper I suppose. It depends how their server responds to automated requests. And then there is the work of turning that into a lookup table. I will also look at the TNRS data Dan queried as that could be an alternative synonym source. Amy was concerned about using a newer table, but it should not be a problem if we do step 1 right before GBIF query and step 2 right after. That way the canonical names are the Tank tree ones. Now, this does not help us match synonyms against other trait data, but we could always do that split/merge again for matching against trait databases not already lined up with Tank tree taxon labels.

Sorry that is so wordy. I'm leaving for the field (Guadalupe Mountains) early tomorrow morning, so I'll be out of touch for a few days.

-Dylan

On 03/20/2014 12:56 PM, ejforrestel wrote:

I think this is where getting the updated Plant List would be useful -- we

can then match all species to the list and compile a synonymy list

much like what was done already (but with the older version of the plant list). I have never used TNRS before to match names...it uses some of the same sources as the Plant List.

On Thu, Mar 20, 2014 at 10:47 AM, Dan McGlinn < notifications@github.com wrote:

We recently received a list of name matches from GBIF. Of the 88,135 names we submitted 83,269 had at least one record in GBIF. If we just look at the names on the Tank tree of the 31,749 names on the tree 30,903 had matches so 846 names did not yield a match.

Although we're doing really good at getting a lot of matches. If someone has the energy / time it would be great if they made a script to see if there are known synonyms for the names that did not yield any GBIF records that are not already in our query. Then we could send this larger list back to Jan and have it carry out the full query. Also if we want to check if the GBIF database at least contains a name ourselves the package rGBIF can do this very quickly.

@AmyZanne https://github.com/AmyZanne has suggested that when the

original names list was put together synonym were included but I have not had a chance to look at this carefully.

One thing a did do to get us started down this path was to bounce all the 83,269 names we submitted to GBIF off TNRS which returned a lot of information on if a name is accepted, by what organization the name is accepted by, and if there are known synonyms. This information is stored in

taxa_for_big_phylo_tnrs.csv<

https://github.com/Fireandplants/plant_gbif/blob/0091a1ffe211ffe82c59c59416227d2cbae7b9fe/query_names/taxa_for_big_phylo_tnrs.csv

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4

.

Reply to this email directly or view it on GitHub <

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38200006

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38205010

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212140

.

Reply to this email directly or view it on GitHub < https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38212668 .

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38214463 .

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/
dschwilk commented 10 years ago

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

AmyZanne commented 10 years ago

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk notifications@github.comwrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077 .

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/
ejforrestel commented 10 years ago

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne notifications@github.com wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk <notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715 .

dmcglinn commented 10 years ago

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel notifications@github.comwrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne notifications@github.com wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk <notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092 .

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

ejforrestel commented 10 years ago

will do, Ill also throw the script up I used to download it

On Thu, Mar 20, 2014 at 1:51 PM, Dan McGlinn notifications@github.comwrote:

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel <notifications@github.com

wrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne notifications@github.com wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk < notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219467 .

dmcglinn commented 10 years ago

awesome! you can put the script in the ./scripts directory.

thanks dan

On Thu, Mar 20, 2014 at 2:51 PM, ejforrestel notifications@github.comwrote:

will do, Ill also throw the script up I used to download it

On Thu, Mar 20, 2014 at 1:51 PM, Dan McGlinn <notifications@github.com

wrote:

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel <notifications@github.com

wrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne notifications@github.com wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk < notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219467

.

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219545 .

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

ejforrestel commented 10 years ago

Ugh, it seemed to good to be true... and it is. for some reason only 515 K species downloaded for angiosperms and there should be close to 1 million. Amy, could you upload the old list so I can compare and see what might be up?

Thanks, Beth

On Thu, Mar 20, 2014 at 1:58 PM, Dan McGlinn notifications@github.comwrote:

awesome! you can put the script in the ./scripts directory.

thanks dan

On Thu, Mar 20, 2014 at 2:51 PM, ejforrestel <notifications@github.com

wrote:

will do, Ill also throw the script up I used to download it

On Thu, Mar 20, 2014 at 1:51 PM, Dan McGlinn <notifications@github.com

wrote:

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel <notifications@github.com

wrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne notifications@github.com wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk < notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219467

.

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219545

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38220319 .

AmyZanne commented 10 years ago

Wait, I just remembered that Wil C, Rich Fitzjohn, and Matt Pennell have pulled together all the code and data for our "How much of the world is woody" paper that just got accepted.

Rich's code is here:

On Thu, Mar 20, 2014 at 4:58 PM, Dan McGlinn notifications@github.comwrote:

awesome! you can put the script in the ./scripts directory.

thanks dan

On Thu, Mar 20, 2014 at 2:51 PM, ejforrestel <notifications@github.com

wrote:

will do, Ill also throw the script up I used to download it

On Thu, Mar 20, 2014 at 1:51 PM, Dan McGlinn <notifications@github.com

wrote:

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel <notifications@github.com

wrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne notifications@github.com wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk < notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219467

.

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219545

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38220319 .

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/
AmyZanne commented 10 years ago

Sorry, was trying to paste and instead it sent: https://github.com/richfitz/wood/

Rich writes this:

Avoid hammering TPL

This fetches a set of data that I've archived.

make theplantlist-cache-unpack

This route allows you to delete all the data (make purge) and easily rerun the analysis (make theplantlist-cache-unpack all) without redownloading the data.

On Thu, Mar 20, 2014 at 5:03 PM, Amy Zanne aezanne@gmail.com wrote:

Wait, I just remembered that Wil C, Rich Fitzjohn, and Matt Pennell have pulled together all the code and data for our "How much of the world is woody" paper that just got accepted.

Rich's code is here:

On Thu, Mar 20, 2014 at 4:58 PM, Dan McGlinn notifications@github.comwrote:

awesome! you can put the script in the ./scripts directory.

thanks dan

On Thu, Mar 20, 2014 at 2:51 PM, ejforrestel <notifications@github.com

wrote:

will do, Ill also throw the script up I used to download it

On Thu, Mar 20, 2014 at 1:51 PM, Dan McGlinn <notifications@github.com

wrote:

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel < notifications@github.com

wrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne <notifications@github.com

wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk < notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219467

.

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219545

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38220319 .

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/
AmyZanne commented 10 years ago

And, here's where they made the code transparent. http://richfitz.github.io/wood/wood.html

They have worked with the most recent plant list to do the cleaning. So we should be able to use their plant list build and cleaning scripts.

On Thu, Mar 20, 2014 at 5:05 PM, Amy Zanne aezanne@gmail.com wrote:

Sorry, was trying to paste and instead it sent: https://github.com/richfitz/wood/

Rich writes this:

Avoid hammering TPL

This fetches a set of data that I've archived.

make theplantlist-cache-unpack

This route allows you to delete all the data (make purge) and easily rerun the analysis (make theplantlist-cache-unpack all) without redownloading the data.

On Thu, Mar 20, 2014 at 5:03 PM, Amy Zanne aezanne@gmail.com wrote:

Wait, I just remembered that Wil C, Rich Fitzjohn, and Matt Pennell have pulled together all the code and data for our "How much of the world is woody" paper that just got accepted.

Rich's code is here:

On Thu, Mar 20, 2014 at 4:58 PM, Dan McGlinn notifications@github.comwrote:

awesome! you can put the script in the ./scripts directory.

thanks dan

On Thu, Mar 20, 2014 at 2:51 PM, ejforrestel <notifications@github.com

wrote:

will do, Ill also throw the script up I used to download it

On Thu, Mar 20, 2014 at 1:51 PM, Dan McGlinn <notifications@github.com

wrote:

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel < notifications@github.com

wrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne < notifications@github.com> wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk < notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219467

.

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219545

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38220319 .

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/
ejforrestel commented 10 years ago

This is great! I am essentially using similar code that Rich used, so it is odd that I am retrieving a subset of the list -- but you already have it...

On Thu, Mar 20, 2014 at 2:08 PM, AmyZanne notifications@github.com wrote:

And, here's where they made the code transparent. http://richfitz.github.io/wood/wood.html

They have worked with the most recent plant list to do the cleaning. So we should be able to use their plant list build and cleaning scripts.

On Thu, Mar 20, 2014 at 5:05 PM, Amy Zanne aezanne@gmail.com wrote:

Sorry, was trying to paste and instead it sent: https://github.com/richfitz/wood/

Rich writes this:

Avoid hammering TPL

This fetches a set of data that I've archived.

make theplantlist-cache-unpack

This route allows you to delete all the data (make purge) and easily rerun the analysis (make theplantlist-cache-unpack all) without redownloading the data.

On Thu, Mar 20, 2014 at 5:03 PM, Amy Zanne aezanne@gmail.com wrote:

Wait, I just remembered that Wil C, Rich Fitzjohn, and Matt Pennell have pulled together all the code and data for our "How much of the world is woody" paper that just got accepted.

Rich's code is here:

On Thu, Mar 20, 2014 at 4:58 PM, Dan McGlinn <notifications@github.com wrote:

awesome! you can put the script in the ./scripts directory.

thanks dan

On Thu, Mar 20, 2014 at 2:51 PM, ejforrestel <notifications@github.com

wrote:

will do, Ill also throw the script up I used to download it

On Thu, Mar 20, 2014 at 1:51 PM, Dan McGlinn < notifications@github.com

wrote:

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel < notifications@github.com

wrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne < notifications@github.com> wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk < notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219467

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219545

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38220319

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221342 .

AmyZanne commented 10 years ago

Sorry I didn't think of it sooner. They just updated to the new plant list last week I think and Rich emailed me a couple days ago.

On Thu, Mar 20, 2014 at 5:13 PM, ejforrestel notifications@github.comwrote:

This is great! I am essentially using similar code that Rich used, so it is odd that I am retrieving a subset of the list -- but you already have it...

On Thu, Mar 20, 2014 at 2:08 PM, AmyZanne notifications@github.com wrote:

And, here's where they made the code transparent. http://richfitz.github.io/wood/wood.html

They have worked with the most recent plant list to do the cleaning. So we should be able to use their plant list build and cleaning scripts.

On Thu, Mar 20, 2014 at 5:05 PM, Amy Zanne aezanne@gmail.com wrote:

Sorry, was trying to paste and instead it sent: https://github.com/richfitz/wood/

Rich writes this:

Avoid hammering TPL

This fetches a set of data that I've archived.

make theplantlist-cache-unpack

This route allows you to delete all the data (make purge) and easily rerun the analysis (make theplantlist-cache-unpack all) without redownloading the data.

On Thu, Mar 20, 2014 at 5:03 PM, Amy Zanne aezanne@gmail.com wrote:

Wait, I just remembered that Wil C, Rich Fitzjohn, and Matt Pennell have pulled together all the code and data for our "How much of the world is woody" paper that just got accepted.

Rich's code is here:

On Thu, Mar 20, 2014 at 4:58 PM, Dan McGlinn < notifications@github.com wrote:

awesome! you can put the script in the ./scripts directory.

thanks dan

On Thu, Mar 20, 2014 at 2:51 PM, ejforrestel < notifications@github.com

wrote:

will do, Ill also throw the script up I used to download it

On Thu, Mar 20, 2014 at 1:51 PM, Dan McGlinn < notifications@github.com

wrote:

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel < notifications@github.com

wrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne < notifications@github.com> wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk < notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219467

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219545

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38220319

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221342

.

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221836 .

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/
ejforrestel commented 10 years ago

No worries, do you have the current list then -- can you put it up on the fire plants repo?

On Thu, Mar 20, 2014 at 2:17 PM, AmyZanne notifications@github.com wrote:

Sorry I didn't think of it sooner. They just updated to the new plant list last week I think and Rich emailed me a couple days ago.

On Thu, Mar 20, 2014 at 5:13 PM, ejforrestel <notifications@github.com

wrote:

This is great! I am essentially using similar code that Rich used, so it is odd that I am retrieving a subset of the list -- but you already have it...

On Thu, Mar 20, 2014 at 2:08 PM, AmyZanne notifications@github.com wrote:

And, here's where they made the code transparent. http://richfitz.github.io/wood/wood.html

They have worked with the most recent plant list to do the cleaning. So we should be able to use their plant list build and cleaning scripts.

On Thu, Mar 20, 2014 at 5:05 PM, Amy Zanne aezanne@gmail.com wrote:

Sorry, was trying to paste and instead it sent: https://github.com/richfitz/wood/

Rich writes this:

Avoid hammering TPL

This fetches a set of data that I've archived.

make theplantlist-cache-unpack

This route allows you to delete all the data (make purge) and easily rerun the analysis (make theplantlist-cache-unpack all) without redownloading the data.

On Thu, Mar 20, 2014 at 5:03 PM, Amy Zanne aezanne@gmail.com wrote:

Wait, I just remembered that Wil C, Rich Fitzjohn, and Matt Pennell have pulled together all the code and data for our "How much of the world is woody" paper that just got accepted.

Rich's code is here:

On Thu, Mar 20, 2014 at 4:58 PM, Dan McGlinn < notifications@github.com wrote:

awesome! you can put the script in the ./scripts directory.

thanks dan

On Thu, Mar 20, 2014 at 2:51 PM, ejforrestel < notifications@github.com

wrote:

will do, Ill also throw the script up I used to download it

On Thu, Mar 20, 2014 at 1:51 PM, Dan McGlinn < notifications@github.com

wrote:

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel < notifications@github.com

wrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne < notifications@github.com> wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk < notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219467

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219545

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38220319

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221342

.

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221836

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38222132 .

AmyZanne commented 10 years ago

No I haven't gone through his code. I just talked to Will C who told me this is what they did and then looked at Rich's code just now. I can look at it this weekend to see if I can grab it.

On Thu, Mar 20, 2014 at 5:21 PM, ejforrestel notifications@github.comwrote:

No worries, do you have the current list then -- can you put it up on the fire plants repo?

On Thu, Mar 20, 2014 at 2:17 PM, AmyZanne notifications@github.com wrote:

Sorry I didn't think of it sooner. They just updated to the new plant list last week I think and Rich emailed me a couple days ago.

On Thu, Mar 20, 2014 at 5:13 PM, ejforrestel <notifications@github.com

wrote:

This is great! I am essentially using similar code that Rich used, so it is odd that I am retrieving a subset of the list -- but you already have it...

On Thu, Mar 20, 2014 at 2:08 PM, AmyZanne notifications@github.com wrote:

And, here's where they made the code transparent. http://richfitz.github.io/wood/wood.html

They have worked with the most recent plant list to do the cleaning. So we should be able to use their plant list build and cleaning scripts.

On Thu, Mar 20, 2014 at 5:05 PM, Amy Zanne aezanne@gmail.com wrote:

Sorry, was trying to paste and instead it sent: https://github.com/richfitz/wood/

Rich writes this:

Avoid hammering TPL

This fetches a set of data that I've archived.

make theplantlist-cache-unpack

This route allows you to delete all the data (make purge) and easily rerun the analysis (make theplantlist-cache-unpack all) without redownloading the data.

On Thu, Mar 20, 2014 at 5:03 PM, Amy Zanne aezanne@gmail.com wrote:

Wait, I just remembered that Wil C, Rich Fitzjohn, and Matt Pennell have pulled together all the code and data for our "How much of the world is woody" paper that just got accepted.

Rich's code is here:

On Thu, Mar 20, 2014 at 4:58 PM, Dan McGlinn < notifications@github.com wrote:

awesome! you can put the script in the ./scripts directory.

thanks dan

On Thu, Mar 20, 2014 at 2:51 PM, ejforrestel < notifications@github.com

wrote:

will do, Ill also throw the script up I used to download it

On Thu, Mar 20, 2014 at 1:51 PM, Dan McGlinn < notifications@github.com

wrote:

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel < notifications@github.com

wrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne < notifications@github.com> wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk < notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219467

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219545

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38220319

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221342

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221836

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38222132

.

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38222609 .

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/
ejforrestel commented 10 years ago

got it ~ if it's easier I can run it too and see if I still get the same list as I am currently.

On Thu, Mar 20, 2014 at 2:36 PM, AmyZanne notifications@github.com wrote:

No I haven't gone through his code. I just talked to Will C who told me this is what they did and then looked at Rich's code just now. I can look at it this weekend to see if I can grab it.

On Thu, Mar 20, 2014 at 5:21 PM, ejforrestel <notifications@github.com

wrote:

No worries, do you have the current list then -- can you put it up on the fire plants repo?

On Thu, Mar 20, 2014 at 2:17 PM, AmyZanne notifications@github.com wrote:

Sorry I didn't think of it sooner. They just updated to the new plant list last week I think and Rich emailed me a couple days ago.

On Thu, Mar 20, 2014 at 5:13 PM, ejforrestel <notifications@github.com

wrote:

This is great! I am essentially using similar code that Rich used, so it is odd that I am retrieving a subset of the list -- but you already have it...

On Thu, Mar 20, 2014 at 2:08 PM, AmyZanne notifications@github.com wrote:

And, here's where they made the code transparent. http://richfitz.github.io/wood/wood.html

They have worked with the most recent plant list to do the cleaning. So we should be able to use their plant list build and cleaning scripts.

On Thu, Mar 20, 2014 at 5:05 PM, Amy Zanne aezanne@gmail.com wrote:

Sorry, was trying to paste and instead it sent: https://github.com/richfitz/wood/

Rich writes this:

Avoid hammering TPL

This fetches a set of data that I've archived.

make theplantlist-cache-unpack

This route allows you to delete all the data (make purge) and easily rerun the analysis (make theplantlist-cache-unpack all) without redownloading the data.

On Thu, Mar 20, 2014 at 5:03 PM, Amy Zanne aezanne@gmail.com wrote:

Wait, I just remembered that Wil C, Rich Fitzjohn, and Matt Pennell have pulled together all the code and data for our "How much of the world is woody" paper that just got accepted.

Rich's code is here:

On Thu, Mar 20, 2014 at 4:58 PM, Dan McGlinn < notifications@github.com wrote:

awesome! you can put the script in the ./scripts directory.

thanks dan

On Thu, Mar 20, 2014 at 2:51 PM, ejforrestel < notifications@github.com

wrote:

will do, Ill also throw the script up I used to download it

On Thu, Mar 20, 2014 at 1:51 PM, Dan McGlinn < notifications@github.com

wrote:

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel < notifications@github.com

wrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne < notifications@github.com> wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk < notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219467

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219545

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38220319

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221342

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221836

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38222132

.

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38222609

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38224069 .

AmyZanne commented 10 years ago

if you have time, give it a go and see how you make out. i guarantee you will be faster than i. if there's any drama let me know. worst case i can get in touch with one of those guys.

On Thu, Mar 20, 2014 at 5:38 PM, ejforrestel notifications@github.comwrote:

got it ~ if it's easier I can run it too and see if I still get the same list as I am currently.

On Thu, Mar 20, 2014 at 2:36 PM, AmyZanne notifications@github.com wrote:

No I haven't gone through his code. I just talked to Will C who told me this is what they did and then looked at Rich's code just now. I can look at it this weekend to see if I can grab it.

On Thu, Mar 20, 2014 at 5:21 PM, ejforrestel <notifications@github.com

wrote:

No worries, do you have the current list then -- can you put it up on the fire plants repo?

On Thu, Mar 20, 2014 at 2:17 PM, AmyZanne notifications@github.com wrote:

Sorry I didn't think of it sooner. They just updated to the new plant list last week I think and Rich emailed me a couple days ago.

On Thu, Mar 20, 2014 at 5:13 PM, ejforrestel < notifications@github.com

wrote:

This is great! I am essentially using similar code that Rich used, so it is odd that I am retrieving a subset of the list -- but you already have it...

On Thu, Mar 20, 2014 at 2:08 PM, AmyZanne < notifications@github.com> wrote:

And, here's where they made the code transparent. http://richfitz.github.io/wood/wood.html

They have worked with the most recent plant list to do the cleaning. So we should be able to use their plant list build and cleaning scripts.

On Thu, Mar 20, 2014 at 5:05 PM, Amy Zanne aezanne@gmail.com wrote:

Sorry, was trying to paste and instead it sent: https://github.com/richfitz/wood/

Rich writes this:

Avoid hammering TPL

This fetches a set of data that I've archived.

make theplantlist-cache-unpack

This route allows you to delete all the data (make purge) and easily rerun the analysis (make theplantlist-cache-unpack all) without redownloading the data.

On Thu, Mar 20, 2014 at 5:03 PM, Amy Zanne aezanne@gmail.com wrote:

Wait, I just remembered that Wil C, Rich Fitzjohn, and Matt Pennell have pulled together all the code and data for our "How much of the world is woody" paper that just got accepted.

Rich's code is here:

On Thu, Mar 20, 2014 at 4:58 PM, Dan McGlinn < notifications@github.com wrote:

awesome! you can put the script in the ./scripts directory.

thanks dan

On Thu, Mar 20, 2014 at 2:51 PM, ejforrestel < notifications@github.com

wrote:

will do, Ill also throw the script up I used to download it

On Thu, Mar 20, 2014 at 1:51 PM, Dan McGlinn < notifications@github.com

wrote:

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel < notifications@github.com

wrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne < notifications@github.com> wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk < notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219467

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219545

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38220319

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221342

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221836

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38222132

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38222609

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38224069

.

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38224217 .

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/
ejforrestel commented 10 years ago

Hey guys~

So the reason why I got a subset of what the plant list includes is that you only get the accepted and unresolved names when you pull off the csv links. The only way you actually get the synonyms is if you search for each individual species -- I don't know how to get around this problem. Any thoughts? Is this a new feature? It seems pretty clear that the plat list does not want to make their dataset freely available...

Beth

On Thu, Mar 20, 2014 at 5:41 PM, AmyZanne notifications@github.com wrote:

if you have time, give it a go and see how you make out. i guarantee you will be faster than i. if there's any drama let me know. worst case i can get in touch with one of those guys.

On Thu, Mar 20, 2014 at 5:38 PM, ejforrestel <notifications@github.com

wrote:

got it ~ if it's easier I can run it too and see if I still get the same list as I am currently.

On Thu, Mar 20, 2014 at 2:36 PM, AmyZanne notifications@github.com wrote:

No I haven't gone through his code. I just talked to Will C who told me this is what they did and then looked at Rich's code just now. I can look at it this weekend to see if I can grab it.

On Thu, Mar 20, 2014 at 5:21 PM, ejforrestel <notifications@github.com

wrote:

No worries, do you have the current list then -- can you put it up on the fire plants repo?

On Thu, Mar 20, 2014 at 2:17 PM, AmyZanne notifications@github.com wrote:

Sorry I didn't think of it sooner. They just updated to the new plant list last week I think and Rich emailed me a couple days ago.

On Thu, Mar 20, 2014 at 5:13 PM, ejforrestel < notifications@github.com

wrote:

This is great! I am essentially using similar code that Rich used, so it is odd that I am retrieving a subset of the list -- but you already have it...

On Thu, Mar 20, 2014 at 2:08 PM, AmyZanne < notifications@github.com> wrote:

And, here's where they made the code transparent. http://richfitz.github.io/wood/wood.html

They have worked with the most recent plant list to do the cleaning. So we should be able to use their plant list build and cleaning scripts.

On Thu, Mar 20, 2014 at 5:05 PM, Amy Zanne aezanne@gmail.com wrote:

Sorry, was trying to paste and instead it sent: https://github.com/richfitz/wood/

Rich writes this:

Avoid hammering TPL

This fetches a set of data that I've archived.

make theplantlist-cache-unpack

This route allows you to delete all the data (make purge) and easily rerun the analysis (make theplantlist-cache-unpack all) without redownloading the data.

On Thu, Mar 20, 2014 at 5:03 PM, Amy Zanne < aezanne@gmail.com> wrote:

Wait, I just remembered that Wil C, Rich Fitzjohn, and Matt Pennell have pulled together all the code and data for our "How much of the world is woody" paper that just got accepted.

Rich's code is here:

On Thu, Mar 20, 2014 at 4:58 PM, Dan McGlinn < notifications@github.com wrote:

awesome! you can put the script in the ./scripts directory.

thanks dan

On Thu, Mar 20, 2014 at 2:51 PM, ejforrestel < notifications@github.com

wrote:

will do, Ill also throw the script up I used to download it

On Thu, Mar 20, 2014 at 1:51 PM, Dan McGlinn < notifications@github.com

wrote:

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel < notifications@github.com

wrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne < notifications@github.com> wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk < notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219467

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219545

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38220319

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221342

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221836

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38222132

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38222609

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38224069

.

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38224217

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38224517 .

ejforrestel commented 10 years ago

Okay, unless someone interjects here with a better idea (of which I am certain there is one!), I will first download all the accepted names and subsequently write a script to pull all the synonyms one species at a time -- not sure what alternative there is!

On Wed, Mar 26, 2014 at 4:30 PM, Beth Forrestel ejforrestel@gmail.comwrote:

Hey guys~

So the reason why I got a subset of what the plant list includes is that you only get the accepted and unresolved names when you pull off the csv links. The only way you actually get the synonyms is if you search for each individual species -- I don't know how to get around this problem. Any thoughts? Is this a new feature? It seems pretty clear that the plat list does not want to make their dataset freely available...

Beth

On Thu, Mar 20, 2014 at 5:41 PM, AmyZanne notifications@github.comwrote:

if you have time, give it a go and see how you make out. i guarantee you will be faster than i. if there's any drama let me know. worst case i can get in touch with one of those guys.

On Thu, Mar 20, 2014 at 5:38 PM, ejforrestel <notifications@github.com

wrote:

got it ~ if it's easier I can run it too and see if I still get the same list as I am currently.

On Thu, Mar 20, 2014 at 2:36 PM, AmyZanne notifications@github.com wrote:

No I haven't gone through his code. I just talked to Will C who told me this is what they did and then looked at Rich's code just now. I can look at it this weekend to see if I can grab it.

On Thu, Mar 20, 2014 at 5:21 PM, ejforrestel < notifications@github.com

wrote:

No worries, do you have the current list then -- can you put it up on the fire plants repo?

On Thu, Mar 20, 2014 at 2:17 PM, AmyZanne <notifications@github.com

wrote:

Sorry I didn't think of it sooner. They just updated to the new plant list last week I think and Rich emailed me a couple days ago.

On Thu, Mar 20, 2014 at 5:13 PM, ejforrestel < notifications@github.com

wrote:

This is great! I am essentially using similar code that Rich used, so it is odd that I am retrieving a subset of the list -- but you already have it...

On Thu, Mar 20, 2014 at 2:08 PM, AmyZanne < notifications@github.com> wrote:

And, here's where they made the code transparent. http://richfitz.github.io/wood/wood.html

They have worked with the most recent plant list to do the cleaning. So we should be able to use their plant list build and cleaning scripts.

On Thu, Mar 20, 2014 at 5:05 PM, Amy Zanne <aezanne@gmail.com

wrote:

Sorry, was trying to paste and instead it sent: https://github.com/richfitz/wood/

Rich writes this:

Avoid hammering TPL

This fetches a set of data that I've archived.

make theplantlist-cache-unpack

This route allows you to delete all the data (make purge) and easily rerun the analysis (make theplantlist-cache-unpack all) without redownloading the data.

On Thu, Mar 20, 2014 at 5:03 PM, Amy Zanne < aezanne@gmail.com> wrote:

Wait, I just remembered that Wil C, Rich Fitzjohn, and Matt Pennell have pulled together all the code and data for our "How much of the world is woody" paper that just got accepted.

Rich's code is here:

On Thu, Mar 20, 2014 at 4:58 PM, Dan McGlinn < notifications@github.com wrote:

awesome! you can put the script in the ./scripts directory.

thanks dan

On Thu, Mar 20, 2014 at 2:51 PM, ejforrestel < notifications@github.com

wrote:

will do, Ill also throw the script up I used to download it

On Thu, Mar 20, 2014 at 1:51 PM, Dan McGlinn < notifications@github.com

wrote:

yep you can pop it in the folder called ./query_names

dan

On Thu, Mar 20, 2014 at 2:47 PM, ejforrestel < notifications@github.com

wrote:

Do you guys want me to upload the plant list to github when it finishes downloading?

On Thu, Mar 20, 2014 at 1:35 PM, AmyZanne < notifications@github.com> wrote:

And by you I mean we :). I am just the weak link in the coding ability here.

On Thu, Mar 20, 2014 at 4:29 PM, Dylan Schwilk < notifications@github.com

wrote:

On 03/20/2014 03:13 PM, AmyZanne wrote:

I'd say it's better to fix the errors before checking for synonymy. Otherwise you will just have to remap them.

TRUE!

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217077

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38217715

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219092

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219467

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38219545

.

Daniel J. McGlinn, PhD Postdoctoral Researcher Utah State University Department of Biology, BNR 132 Logan, UT 84322-5305 http://mcglinn.web.unc.edu/ cell: 405-612-1780

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38220319

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221342

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38221836

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38222132

.

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38222609

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHub<

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38224069

.

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38224217

.

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38224517 .

dmcglinn commented 10 years ago

This seems like a ton of work to get this list - I don't think reviewers can expect us to use a list that is not being made easily available. How much of an improvement are we likely to see using the Plant List relative to the datasources that tnrs uses? I'm a noob when it comes to plant taxonomy so I don't know what the answers are to these questions.

dschwilk commented 10 years ago

Hi folks,

I'm back from the field and will set aside some time tomorrow to compare the TNRS results and The Plant List results. Even spacing queries to TPL server 10 sec apart 32k species is almost 90 hours! Is that what Rich's code does? Sorry I haven't looked at his yet. I will tomorrow.

On 03/26/2014 04:20 PM, Dan McGlinn wrote:

This seems like a ton of work to get this list - I don't think reviewers can expect us to use a list that is being made easily available. How much of an improvement are we likely to see using the Plant List relative to the datasources that tnrs uses? I'm a noob when it comes to plant taxonomy so I don't know what the answers are to these questions.

— Reply to this email directly or view it on GitHub https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38741306.

ejforrestel commented 10 years ago

His code only gets the accepted names, which is what you need to use to pull the synonyms. You can only retrieve the synonyms list via downloading a csv file by searching the accepted name. I am working on this and will give you an update by tomorrow!

On Wed, Mar 26, 2014 at 5:58 PM, Dylan Schwilk notifications@github.comwrote:

Hi folks,

I'm back from the field and will set aside some time tomorrow to compare the TNRS results and The Plant List results. Even spacing queries to TPL server 10 sec apart 32k species is almost 90 hours! Is that what Rich's code does? Sorry I haven't looked at his yet. I will tomorrow.

On 03/26/2014 04:20 PM, Dan McGlinn wrote:

This seems like a ton of work to get this list - I don't think reviewers can expect us to use a list that is being made easily available. How much of an improvement are we likely to see using the Plant List relative to the datasources that tnrs uses? I'm a noob when it comes to plant taxonomy so I don't know what the answers are to these questions.

Reply to this email directly or view it on GitHub < https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38741306 .

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38745582 .

AmyZanne commented 10 years ago

Hi Beth, I am sorry. I don't have a good solution. I think we did some of our mapping earlier with IPNI, which sucked. Then we brought everything up to the Plant List. But it was a pain in the ass getting it. They were not helpful.

My best suggestions would be to check in with the 2 other groups with whom I have worked who have worked with the Plant List. Will Pearse I believe had synonymys from the old Plant LIst (but I could be wrong on that as we got some weird mappings which just took the first mapping not the best). Will C, Rich Fitzjohn, and Matt Pennell were the ones who just downloaded and dealt with the new Plant List. I can write to any/all of them if you want to ask about this. I think they probably ignored it but I am not for sure.

Let me know.

Best, Amy

On Wed, Mar 26, 2014 at 6:06 PM, ejforrestel notifications@github.comwrote:

His code only gets the accepted names, which is what you need to use to pull the synonyms. You can only retrieve the synonyms list via downloading a csv file by searching the accepted name. I am working on this and will give you an update by tomorrow!

On Wed, Mar 26, 2014 at 5:58 PM, Dylan Schwilk <notifications@github.com

wrote:

Hi folks,

I'm back from the field and will set aside some time tomorrow to compare the TNRS results and The Plant List results. Even spacing queries to TPL server 10 sec apart 32k species is almost 90 hours! Is that what Rich's code does? Sorry I haven't looked at his yet. I will tomorrow.

On 03/26/2014 04:20 PM, Dan McGlinn wrote:

This seems like a ton of work to get this list - I don't think reviewers can expect us to use a list that is being made easily available. How much of an improvement are we likely to see using the Plant List relative to the datasources that tnrs uses? I'm a noob when it comes to plant taxonomy so I don't know what the answers are to these questions.

Reply to this email directly or view it on GitHub <

https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38741306

.

Reply to this email directly or view it on GitHub< https://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38745582

.

Reply to this email directly or view it on GitHubhttps://github.com/Fireandplants/plant_gbif/issues/4#issuecomment-38746346 .

Dr. Amy Zanne
Department of Biological Sciences
2023 G St. NW
George Washington University
Washington, DC 20052

Office: 352 Lisner Hall
Office Phone: (202) 994-8751
Lab: 409 Bell Hall
Lab Phone: (202) 994-9613
Fax: (202) 994-6100
Website: http://www.phylodiversity.net/azanne/
dschwilk commented 10 years ago

I read through the methods on the Cornwell traits and lineages paper. I can email Will C as he is an old friend and see if there is anything he can share. I guess it is not efficient to re-invent things, but on the other hand, I don't mind re-implementing -- it is just the server query time and load that is silly to redo.

I emailed Will.

@ejforrestel and @AmyZanne : you guys both mention code in comments above in this issue. Can any of it go up here in /scripts? Or perhaps to the bigphylo repo as that will have a smaller set of users if there is concern?

dschwilk commented 10 years ago

Just to confirm: for GBIF records we are limited to creating a taxon name list and submitting that, correct? I ask, because the complete name-matching steps, as Amy pointed out, really should involve both synonymy and misspellings. But dealing with misspellings/gender through partial matching requires access to the full database of interest to match against. I'm assuming that for GBIF records we can just deal with synonymy. But I will work on some scripts for fuzzy name matching which will be useful for lining up our taxa (plus synonyms) with other trait databases. I'm writing Will C now about any synonymy table based on TPL.

Edit: Added some starter code for fuzzy matching using Levenshtein distance (7d95c2d)

Edit: See email conversation 3/27/14-3/30/14. I added code snippets attached to those emails to a scripts/contrib directory. See 0421d33.

dschwilk commented 10 years ago

I'm closing this issue as it has been superseded by #7.