forc-db / ForC

Global Forest Carbon Database
https://forc-db.github.io/
Creative Commons Attribution 4.0 International
55 stars 24 forks source link

Update citations and reference library #223

Closed teixeirak closed 3 years ago

teixeirak commented 3 years ago

I'm opening this issue for the record, and to update @ValentineHerr.

@Troger4 is currently working on updating ForC_citations and the reference library. Tasks include the following (@Troger4 , you can check these off once you're done):

@Troger4 , this is your first example of a GitHub issue. If you run into any questions related to this, it will be best if you post them in the issue (online or by responding to email). That way, we'll retain a record of what we've worked through/ decided.

teixeirak commented 3 years ago

@Troger4 , I forgot to mention that a lot of citations are available here: https://www.dropbox.com/sh/znee1tak8t7zu6o/AAA-poV8sBLKvAPdIDBC4B0ga?dl=0 . These are references from another database (SRDB) that we imported. I don't know if this will be helpful at this point. SRDB records all have DOIs (when applicable), so this will be different from the set you're looking up.

teixeirak commented 3 years ago

@Troger4 ,

It would be great if you could make sure all of the following pubs have complete records (they probably do):

These are studies likely to go to IPCC in the first round.

Troger4 commented 3 years ago

I checked all the publications and added in a few abstracts. I also added in all of the "NACs" in the language column. I didn't see the last part of your message till now so I did add language for NAC citation.citation but I do not believe it was many and all of the remaining publications in this category had language already filled. Very sorry for the confusion.

teixeirak commented 3 years ago

Many thanks, @Troger4, and no worries about adding language when citation.citation=NAC. This won't mess anything up.

teixeirak commented 3 years ago

@Troger4 , @ValentineHerr , I realized that our method of getting language isn't fool-proof in that there are some studies with titles and abstracts translated into English. I know we have a few original studies in Mandarin and maybe other languages where this is the case (I found and corrected one: Yu_1999_doae). I don't see a way to resolve this short of accessing the original pdfs, which of course we'll do for any that we send to IPCC. Teagan, as you go through and retrieve/ rename pdfs, you may find these and can correct the entries.

ValentineHerr commented 3 years ago

Thanks for fixing this @teixeirak. Yes, it can be tricky... also, IPPC wants the abstracts in English, regardless, so @Troger4, when you see an abstract in something other than the English, please edit to replace with the English abstract that is usually in the PDF (or online). Thanks!

Troger4 commented 3 years ago

Thanks for letting me know! I’ll work on accessing the original documents to double check as I go through the list. If the study only has a summary and not an abstract, would you like me to use a summary in the same capacity?

Thank you!

Get Outlook for iOShttps://aka.ms/o0ukef


From: Valentine Herrmann @.> Sent: Tuesday, March 30, 2021 8:40:01 AM To: forc-db/ForC @.> Cc: Teagan Rogers @.>; Mention @.> Subject: Re: [forc-db/ForC] Update citations and reference library (#223)

Thanks for fixing this @teixeirakhttps://secure-web.cisco.com/1xrXy5qvUwQfJEMMCso_ctk7HN08h88MhrmWCBzHQjXA7MQSwnmFIi2oG-7kRXQZ_RU9Tw5anp8jfwsDqN-BaZrPtJhed9V0QDjtCG8AuP1KeWMXL3CbUq4etXazXpYk3ZH8DH1GR6vKFGdGpmZ7BL7FsrBgoJA6wowB0vs6x8Cq3NDmIPWose0BhC__VZmnE-uh_03uzfNKcGrXYSGDkLeiuScXMA3ChE1JKBpVKpuPLYFtu421BoC7mQgVCyhj8sdsdKCokGFNEvOam_MunSIc_xlqUJc7-fI8kFihud4hVClmQLZR_E8CaJ9wovjvsKyVaCCGbcwxuQQpYe2tJ_ZzG3zsBV93XwW3ugYuMbSGdGZPY3fpiH9-yn6xe8utZzrAms5D8n6IouL8a2KskIQencatq_FE05C89ht7w979IpCDPFUHwmPSd9lL3ZGdu/https%3A%2F%2Fgithub.com%2Fteixeirak. Yes, it can be tricky... also, IPPC wants the abstracts in English, regardless, so @Troger4https://secure-web.cisco.com/1dAVJtZXHzaMSFQ9s9RE3B1O5CHhYUGG4-nI4QG-vy9cdyowWJc0AXpB7vW_vrVmRQN-GwbCLVSRo6W7BioIi-NJz1J6kHMvkm8JHXLFCVgQxlGxlOWekFcj415xgR748fYd1Wo0yqvk43BaacI9MAk4IZSDrtRf1Spr-QKEfZtxtLEZkeMaDZjaDrufRdl553gPETArHHS_XIqjhRWiX7T-jhI_Vvu74uCjy8cZLUZk1ceERZLfsZhsmrW1_jyEGQsRUX36SkzjXRX4seNYGU7vnIw0xjG4X3nTmO3mIYdELGbBpv4N1w0G9XsIPlD_w80rFYWSNEcu1rWsSi74RklM2fYJOfa3bvcu84Wo5Qyw6-Z0iNxVeufnmXh6yvSkCmWPx8H_O5Xlc88-kvVKSGbXt3PS8zJcKakMKq19KpKW6TO9D0OX-_sSRcHdkh2Li/https%3A%2F%2Fgithub.com%2FTroger4, when you see an abstract in something other than the English, please edit to replace with the English abstract that is usually in the PDF (or online). Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://secure-web.cisco.com/1IztJBguaP61O1Lasv7wGsWrMUYeYpU3SQNmMjej4SDtzB7pco9qicAxkTAOkUJ7dI0hQ9v5J3JCwcUarmbQw9yfYj-T_rH7fX0w4xkzyrmP6PlH4869OwuSkqCGtP3Sk67nmusOXqY5l_Q6DtwmGjEsayATVqf76teDozI3hpEpx3vtPTh8GgOpuXb_hShL23sCM-jpP-UBT1SViZUmGvNOHHkcnwaZUiTgsjBSFcKBN99uklCYuMPN4SknVjJJJweA46LfvZ7mye3SrEqgVJLf0m0OUhJS34DYJOeQVr4KfIHfnsYiJyETKgNUt5G2BIrlHLw7EE0yniLLAbDWxR7kQGAFOfWzuooZD81RzhmsZV7IRyASxdRWDjxajYKpIJrsR0ZibciM-rAFImOkMJcWHo30D-re-sd3NTlUhq2U5bS6sPk21fTI6_4iE9ce5/https%3A%2F%2Fgithub.com%2Fforc-db%2FForC%2Fissues%2F223%23issuecomment-810192154, or unsubscribehttps://secure-web.cisco.com/15rge-pCAOeqi2lM4JgAmTsRxOsfkFUGFegD2HrlVAT0qQ2vac50LSqbogwTiA4YWcCXVH_ARoUTUHgCGuxkyC6LyxIPoVKIUYgeTwIqXgbdrmfTCgH7H6xSZYQUN8wdSqwth9Dq5u0XGbvh4-ML1ZJmNcrX11a5QD1uRkDYEAeN_pgRrhcA7jBH6Z! %20MY7v_vWsHJD4xM0WnHI7yfTvd6DcWRX2jajwiRMpRIbm8-oCMJmx_GN0QKsm-I9tkfV1133Lc1f7gBy22E8-btr9omPaPqrVHyvYq-sFIrJ9Zy_54PQ32LV1hnhDB0y89541HEHMXAHDLQWAFVZ3gp2nfn3dqDc6BnXtj9oJizgHKov1F1GfyvOyOrTnqi_d6OzyublbAwaVk9dXEDqoeTo1kSPF4uIInhJCs-noPitYxTHvnvXVBFrtVmkczO7l3PMClEu/https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FATEUWU4WEGS4H3DYC36MQRLTGHBCDANCNFSM4Z7YH64A.

ValentineHerr commented 3 years ago

Yes, if there is only a summary you can enter that, thanks!

Valentine Herrmann


Research Assistant Smithsonian Institution Smithsonian Conservation Biology Institute 1500 Remount Rd.; Front Royal, VA 22630 USA 1-540-635-6549 | @.***


From: Teagan Rogers @.> Sent: Tuesday, March 30, 2021 09:06 To: forc-db/ForC @.> Cc: Herrmann, Valentine @.>; Mention @.> Subject: Re: [forc-db/ForC] Update citations and reference library (#223)

External Email - Exercise Caution

Thanks for letting me know! I’ll work on accessing the original documents to double check as I go through the list. If the study only has a summary and not an abstract, would you like me to use a summary in the same capacity?

Thank you!

Get Outlook for iOShttps://aka.ms/o0ukef


From: Valentine Herrmann @.> Sent: Tuesday, March 30, 2021 8:40:01 AM To: forc-db/ForC @.> Cc: Teagan Rogers @.>; Mention @.> Subject: Re: [forc-db/ForC] Update citations and reference library (#223)

Thanks for fixing this @teixeirakhttps://secure-web.cisco.com/1xrXy5qvUwQfJEMMCso_ctk7HN08h88MhrmWCBzHQjXA7MQSwnmFIi2oG-7kRXQZ_RU9Tw5anp8jfwsDqN-BaZrPtJhed9V0QDjtCG8AuP1KeWMXL3CbUq4etXazXpYk3ZH8DH1GR6vKFGdGpmZ7BL7FsrBgoJA6wowB0vs6x8Cq3NDmIPWose0BhC__VZmnE-uh_03uzfNKcGrXYSGDkLeiuScXMA3ChE1JKBpVKpuPLYFtu421BoC7mQgVCyhj8sdsdKCokGFNEvOam_MunSIc_xlqUJc7-fI8kFihud4hVClmQLZR_E8CaJ9wovjvsKyVaCCGbcwxuQQpYe2tJ_ZzG3zsBV93XwW3ugYuMbSGdGZPY3fpiH9-yn6xe8utZzrAms5D8n6IouL8a2KskIQencatq_FE05C89ht7w979IpCDPFUHwmPSd9lL3ZGdu/https%3A%2F%2Fgithub.com%2Fteixeirak. Yes, it can be tricky... also, IPPC wants the abstracts in English, regardless, so @Troger4https://secure-web.cisco.com/1dAVJtZXHzaMSFQ9s9RE3B1O5CHhYUGG4-nI4QG-vy9cdyowWJc0AXpB7vW_vrVmRQN-GwbCLVSRo6W7BioIi-NJz1J6kHMvkm8JHXLFCVgQxlGxlOWekFcj415xgR748fYd1Wo0yqvk43BaacI9MAk4IZSDrtRf1Spr-QKEfZtxtLEZkeMaDZjaDrufRdl553gPETArHHS_XIqjhRWiX7T-jhI_Vvu74uCjy8cZLUZk1ceERZLfsZhsmrW1_jyEGQsRUX36SkzjXRX4seNYGU7vnIw0xjG4X3nTmO3mIYdELGbBpv4N1w0G9XsIPlD_w80rFYWSNEcu1rWsSi74RklM2fYJOfa3bvcu84Wo5Qyw6-Z0iNxVeufnmXh6yvSkCmWPx8H_O5Xlc88-kvVKSGbXt3PS8zJcKakMKq19KpKW6TO9D0OX-_sSRcHdkh2Li/https%3A%2F%2Fgithub.com%2FTroger4, when you see an abstract in something other than the English, please edit to replace with the English abstract that is usually in the PDF (or online). Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://secure-web.cisco.com/1IztJBguaP61O1Lasv7wGsWrMUYeYpU3SQNmMjej4SDtzB7pco9qicAxkTAOkUJ7dI0hQ9v5J3JCwcUarmbQw9yfYj-T_rH7fX0w4xkzyrmP6PlH4869OwuSkqCGtP3Sk67nmusOXqY5l_Q6DtwmGjEsayATVqf76teDozI3hpEpx3vtPTh8GgOpuXb_hShL23sCM-jpP-UBT1SViZUmGvNOHHkcnwaZUiTgsjBSFcKBN99uklCYuMPN4SknVjJJJweA46LfvZ7mye3SrEqgVJLf0m0OUhJS34DYJOeQVr4KfIHfnsYiJyETKgNUt5G2BIrlHLw7EE0yniLLAbDWxR7kQGAFOfWzuooZD81RzhmsZV7IRyASxdRWDjxajYKpIJrsR0ZibciM-rAFImOkMJcWHo30D-re-sd3NTlUhq2U5bS6sPk21fTI6_4iE9ce5/https%3A%2F%2Fgithub.com%2Fforc-db%2FForC%2Fissues%2F223%23issuecomment-810192154, or unsubscribehttps://secure-web.cisco.com/15rge-pCAOeqi2lM4JgAmTsRxOsfkFUGFegD2HrlVAT0qQ2vac50LSqbogwTiA4YWcCXVH_ARoUTUHgCGuxkyC6LyxIPoVKIUYgeTwIqXgbdrmfTCgH7H6xSZYQUN8wdSqwth9Dq5u0XGbvh4-ML1ZJmNcrX11a5QD1uRkDYEAeN_pgRrhcA7jBH6Z! %20MY7v_vWsHJD4xM0WnHI7yfTvd6DcWRX2jajwiRMpRIbm8-oCMJmx_GN0QKsm-I9tkfV1133Lc1f7gBy22E8-btr9omPaPqrVHyvYq-sFIrJ9Zy_54PQ32LV1hnhDB0y89541HEHMXAHDLQWAFVZ3gp2nfn3dqDc6BnXtj9oJizgHKov1F1GfyvOyOrTnqi_d6OzyublbAwaVk9dXEDqoeTo1kSPF4uIInhJCs-noPitYxTHvnvXVBFrtVmkczO7l3PMClEu/https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FATEUWU4WEGS4H3DYC36MQRLTGHBCDANCNFSM4Z7YH64A.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fforc-db%2FForC%2Fissues%2F223%23issuecomment-810219603&data=04%7C01%7Cherrmannv%40si.edu%7C86661a4aa894430ffccf08d8f37cabd6%7C989b5e2a14e44efe93b78cdd5fc5d11c%7C0%7C0%7C637527064072430539%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qrk9lE2TvPkBdsBC%2Fwt2KbrGk33Ii8tHzzknJESE7sk%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEWDCIKYSE7XH6MIACM76CTTGHEGJANCNFSM4Z7YH64A&data=04%7C01%7Cherrmannv%40si.edu%7C86661a4aa894430ffccf08d8f37cabd6%7C989b5e2a14e44efe93b78cdd5fc5d11c%7C0%7C0%7C637527064072440493%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pGHehId14qSp1qBFxWmmCwKUCux7crDw6faJl3hbfaA%3D&reserved=0.

Troger4 commented 3 years ago

What should I do if I encounter duplicates? Thanks!

teixeirak commented 3 years ago

Please delete duplicates (being sure that they’re the same!).

Troger4 commented 3 years ago

Will do. I'll also keep a record of deleted duplicates.

ValentineHerr commented 3 years ago

Also, make sure you unify the citation.ID in MEASUREMENTS so that we don't have citation.ID in MEASUREMENTS that don't map to any citation.ID in CITATIONS. Does that make sense?

teixeirak commented 3 years ago

I think we need a little more clarification here:

Troger4 commented 3 years ago

I haven't come across any citation.IDs that I have needed to change in citations, I have been renaming the pdf records in references to match the citation.ID in the citations doc. Thank you for the clarification!

ValentineHerr commented 3 years ago

@Troger4 and @teixeirak, something went wrong with the CITATIONS. Looks like a merging issue. one of you needs to decide what version to keep.

teixeirak commented 3 years ago

@Troger4 , could we do a quick zoom call to resolve this?

teixeirak commented 3 years ago

Alternatively (and easier), @Troger4 , you can let me know when you're not working on the file and I can fix it (then you'll just need to be sure to pull again before editing).

Troger4 commented 3 years ago

Sorry I am just now seeing these. I'll be sure to save more as I go to prevent these issues. Could you show me how to deal with merging issues tomorrow during our Zoom meeting? Thank you!

teixeirak commented 3 years ago

@ValentineHerr , a couple questions following my discussion with @Troger4 this AM:

1- will it mess things up if there are return characters within the abstract? 2- Is it useful to retrieve URL and abstract if there's a DOI but those fields remain blank (i.e., does your lookup process fail on some refs)?

ValentineHerr commented 3 years ago

1- will it mess things up if there are return characters within the abstract?

If it is carriage return \r and not new line return \n it should not mess things up.

Is it useful to retrieve URL and abstract if there's a DOI but those fields remain blank (i.e., does your lookup process fail on some refs)?

If there is a DOI but no URL, that means my look up process (using http://doi.org/[DOI]) failed to find the URL. So if you can retrieve a working URL while you are looking for the complete citation and language (require by IPCC), it would be great. URL is not require for the IPCC import but having it could be useful if we want to retrieve something else in the future. For the abstracts, they are also not required by IPCC and they are more of a pain (special characters, huge etc....) so either way is fine.

ValentineHerr commented 3 years ago

so it looks like maybe there has been a \n.... because some data got moved down. e.g. the info about Meakem_2017_rots is not the line below... I can't figure out what commit caused that. I'll try to find out....

teixeirak commented 3 years ago

I think there may be more than one. Those would be commits from yesterday. Teagan now knows to avoid.

teixeirak commented 3 years ago

@Troger4 , it turns out that the returns in the abstracts were not causing the problem. Valentine found and fixed-- please see messages on Teams, and be sure to pull the latest version and work from that one.

Troger4 commented 3 years ago

Will do, thank you!

teixeirak commented 3 years ago

@Troger4 , could you please prioritize the following (relates to this issue):

Troger4 commented 3 years ago

Yes, I'll start on these now.

Troger4 commented 3 years ago

I see some of these already have DOIs, would you like me to manually fill in the other field info?

teixeirak commented 3 years ago

I think they should be fine if they have DOIs. (Maybe those are the DOIs you've added since Valentine pulled the info? )@ValentineHerr , do you agree?

ValentineHerr commented 3 years ago

Well, my script ignores when there is no full citation or language because those are required by IPCC, so it would be great to add.

ValentineHerr commented 3 years ago

these are for allometries so I could change my script to keep those but that would mean not giving full citation but just DOI for the biomass allometry slot.

teixeirak commented 3 years ago

I think we have some confusion here. What I meant above was that if Teagan adds the DOI, you should be able to pull the other info, correct?

I think it's fine to just give the DOI if the full citation is not available, but shouldn't be necessary.

ValentineHerr commented 3 years ago

Oh... sorry I see the confusion, my bad. I was not planning on re-running that code that pulls the rest of the info as I thought that subsequent manual lookups might as well fill in the info which would for sure be accurate.

ValentineHerr commented 3 years ago

And, if there is already a DOI that means my code was not able to pull any info.

teixeirak commented 3 years ago

Oh... sorry I see the confusion, my bad. I was not planning on re-running that code that pulls the rest of the info as I thought that subsequent manual lookups might as well fill in the info which would for sure be accurate.

Hmmm... I think we should do this, at least for the subset with DOI and lacking the other info. None of the references from GROA came with DOIs, so that's a lot of studies. I asked @Troger4 to just put the DOI.

teixeirak commented 3 years ago

If the lookup failed, that would be something to fill in eventually, but I've asked Teagan to move through these quickly and just pull missing DOIs, with lower priority for the other info. The ones with missing info may tend to be tricky, and I figured we could pull those later as needed.

ValentineHerr commented 3 years ago

Ok, I'll re-run that script when there is a substential number of new DOIs. (maybe it is already the case?)

Troger4 commented 3 years ago

I'll try to get as many DOI as I can tomorrow and we can rerun Sunday or Monday before the deadline. For these pubs, Brown_1997_ebab is a book and doesn't have an abstract so I put NA. I wasn't able to find Ovington_1970_bacc through google scholars or George Mason Library. Kenzo_2009_doar, Ker_1980_tbef, Ker_1984_befs, are not in the CITATIONS document.

teixeirak commented 3 years ago

Thanks, @Troger4 . I don't think any of these are limiting us in terms of what we submit for this first deadline, and we probably won't send them anything past today (because of the weekend, and their Monday starts our Sunday evening).

teixeirak commented 3 years ago

I started a list of missing citations in issue #228 .

Troger4 commented 3 years ago

Great, thank you.

Troger4 commented 3 years ago

Hi there! I found a CITATION with the first name as the citation.ID instead of last name. Creighton_2004_eotd should be Litton_2004_eotd.

teixeirak commented 3 years ago

Hi there! I found a CITATION with the first name as the citation.ID instead of last name. Creighton_2004_eotd should be Litton_2004_eotd.

Thanks, @Troger4 ! I've fixed that in the other files. Please delete Creighton_2004_eotd from the citations spreadsheet and give the pdf the correct name.

Troger4 commented 3 years ago

Will do! Thank you.

teixeirak commented 3 years ago

@ValentineHerr , @Troger4 has finished filling in the missing DOIs, when feasible. You can now re-run the info look-up script for those references.

ValentineHerr commented 3 years ago

looks like there still may be some merging issues with CITATIONS.

image

teixeirak commented 3 years ago

I'll check into it.

ValentineHerr commented 3 years ago

I added what the script was able to retrieve (forgot to refer to this issue in the commits).

teixeirak commented 3 years ago

Okay, thanks. I'll close this issue now. We can retrieve missing info on an as-needed basis.

@Troger4 , many thanks for all your help!

Troger4 commented 3 years ago

Of course! Thank you for teaching me how it all works!