PecanProject / bety

Web-interface to the Biofuel Ecophysiological Traits and Yields Database (used by PEcAn and TERRA REF)
https://www.betydb.org
BSD 3-Clause "New" or "Revised" License
16 stars 38 forks source link

Upload Salix Data Set #283

Open echeng7 opened 9 years ago

echeng7 commented 9 years ago

Data are in the all.yielddata Google doc spreadsheet here:

https://docs.google.com/spreadsheets/d/1hrvMTMc7scvQFgit7QqF68vfxxNufNC--3RJfSDhne4/edit#gid=109120137

@dlebauer @gsrohde

gsrohde commented 9 years ago

@echeng7 Please try the Bulk Upload validation feature on this spreadsheet to see the problems. Regarding the "date" column: If no dates need to be updated, remove this column or append "-ignore" to the column heading: "date-ignore". If dates do need to be updated, then they all need a month and day in addition to the year. If only the year is available from the data source, then use "01" for both the month and the year.

echeng7 commented 9 years ago

Again, same problem as the popular set, I don't know if it is just me but I keep getting an error when I check the bulk upload with rows 33, 34, 35, 57, 58, 59, 60. The cultivars are copied straight from the database with their respective species onto the spreadsheet but when it is uploaded, it changes. Can you look into this for me?

Also, some of the yields did not specifically say which species they were from. Should it be changed to Salix spp.? @gsrohde

dlebauer commented 9 years ago

Emily can you try opening the CSV in notepad and copy pasting in the cultivar names? On Fri, May 8, 2015 at 2:23 PM echeng7 notifications@github.com wrote:

Again, same problem as the popular set, I don't know if it is just me but I keep getting an error when I check the bulk upload with rows 33, 34, 35, 57, 58, 59, 60. The cultivars are copied straight from the database with their respective species onto the spreadsheet but when it is uploaded, it changes. Can you look into this for me?

Also, some of the yields did not specifically say which species they were from. Should it be changed to Salix spp.? @gsrohde https://github.com/gsrohde

— Reply to this email directly or view it on GitHub https://github.com/PecanProject/bety/issues/283#issuecomment-100332976.

echeng7 commented 9 years ago

It still will not work for me. Other suggestions?

On Fri, May 8, 2015 at 2:26 PM, David LeBauer notifications@github.com wrote:

Emily can you try opening the CSV in notepad and copy pasting in the cultivar names? On Fri, May 8, 2015 at 2:23 PM echeng7 notifications@github.com wrote:

Again, same problem as the popular set, I don't know if it is just me but I keep getting an error when I check the bulk upload with rows 33, 34, 35, 57, 58, 59, 60. The cultivars are copied straight from the database with their respective species onto the spreadsheet but when it is uploaded, it changes. Can you look into this for me?

Also, some of the yields did not specifically say which species they were from. Should it be changed to Salix spp.? @gsrohde https://github.com/gsrohde

— Reply to this email directly or view it on GitHub <https://github.com/PecanProject/bety/issues/283#issuecomment-100332976 .

— Reply to this email directly or view it on GitHub https://github.com/PecanProject/bety/issues/283#issuecomment-100333474.

Emily Cheng University of Illinois at Urbana-Champaign | 2015 Molecular and Cellular Biology Major | Chemistry Minor Pre-Student Osteopathic Medicine Association | UIUC Chapter President Illini Emergency Medical Services | Records Officer | EMT- B

dlebauer commented 9 years ago

Change both names to something simple like "testname1" On Sat, May 9, 2015 at 11:10 PM echeng7 notifications@github.com wrote:

It still will not work for me. Other suggestions?

On Fri, May 8, 2015 at 2:26 PM, David LeBauer notifications@github.com wrote:

Emily can you try opening the CSV in notepad and copy pasting in the cultivar names? On Fri, May 8, 2015 at 2:23 PM echeng7 notifications@github.com wrote:

Again, same problem as the popular set, I don't know if it is just me but I keep getting an error when I check the bulk upload with rows 33, 34, 35, 57, 58, 59, 60. The cultivars are copied straight from the database with their respective species onto the spreadsheet but when it is uploaded, it changes. Can you look into this for me?

Also, some of the yields did not specifically say which species they were from. Should it be changed to Salix spp.? @gsrohde https://github.com/gsrohde

— Reply to this email directly or view it on GitHub < https://github.com/PecanProject/bety/issues/283#issuecomment-100332976 .

— Reply to this email directly or view it on GitHub <https://github.com/PecanProject/bety/issues/283#issuecomment-100333474 .

Emily Cheng University of Illinois at Urbana-Champaign | 2015 Molecular and Cellular Biology Major | Chemistry Minor Pre-Student Osteopathic Medicine Association | UIUC Chapter President Illini Emergency Medical Services | Records Officer | EMT- B

— Reply to this email directly or view it on GitHub https://github.com/PecanProject/bety/issues/283#issuecomment-100581283.

dlebauer commented 9 years ago

What do you mean by 'when it is uploaded, it changes'? Can you provide screenshots? On Sat, May 9, 2015 at 11:11 PM David LeBauer dlebauer@gmail.com wrote:

Change both names to something simple like "testname1" On Sat, May 9, 2015 at 11:10 PM echeng7 notifications@github.com wrote:

It still will not work for me. Other suggestions?

On Fri, May 8, 2015 at 2:26 PM, David LeBauer notifications@github.com wrote:

Emily can you try opening the CSV in notepad and copy pasting in the cultivar names? On Fri, May 8, 2015 at 2:23 PM echeng7 notifications@github.com wrote:

Again, same problem as the popular set, I don't know if it is just me but I keep getting an error when I check the bulk upload with rows 33, 34, 35, 57, 58, 59, 60. The cultivars are copied straight from the database with their respective species onto the spreadsheet but when it is uploaded, it changes. Can you look into this for me?

Also, some of the yields did not specifically say which species they were from. Should it be changed to Salix spp.? @gsrohde https://github.com/gsrohde

— Reply to this email directly or view it on GitHub < https://github.com/PecanProject/bety/issues/283#issuecomment-100332976 .

— Reply to this email directly or view it on GitHub <https://github.com/PecanProject/bety/issues/283#issuecomment-100333474 .

Emily Cheng University of Illinois at Urbana-Champaign | 2015 Molecular and Cellular Biology Major | Chemistry Minor Pre-Student Osteopathic Medicine Association | UIUC Chapter President Illini Emergency Medical Services | Records Officer | EMT- B

— Reply to this email directly or view it on GitHub https://github.com/PecanProject/bety/issues/283#issuecomment-100581283.

gsrohde commented 9 years ago

@echeng7 The problem here was that in the database, "Salix schwerinii x S. viminalis" had an extra space before the "x". But the upload software will strip out any extra spaces before trying to match against the database, so even though the species name as typed in the upload file exactly matched the database entry, when the software turned your double space into a single space (it assumes double spaces are typos), the two names no longer matched.

I went through a few months ago and made sure there were no extraneous spaces in species names in the database, but this particular one was a rather recent entry, and the database does not yet enforce the "no double-space" rule.

There was also a problem with using the letter "x" vs. using the times symbol "×" in the names of species hybrids, but I won't go in to this at the moment.

I will try to come up with a robust solution to both problems, but for now, I've corrected the database, so you should no longer get errors for those seven rows. There is still a problem, however, with the species column being blank in several rows, starting in row 151.

gsrohde commented 9 years ago

@echeng7 In regard to your question about some of the yields not specifically saying which species they were from and whether they should be changed to Salix spp., please ask David about this.

echeng7 commented 9 years ago

Also, some of the yields did not specifically say which species they were from. Should it be changed to Salix spp.? @dlebauer

dlebauer commented 9 years ago

Yes, Salix spp.

On Monday, May 11, 2015, echeng7 notifications@github.com wrote:

Also, some of the yields did not specifically say which species they were from. Should it be changed to Salix spp.? @dlebauer https://github.com/dlebauer

— Reply to this email directly or view it on GitHub https://github.com/PecanProject/bety/issues/283#issuecomment-101039196.

dlebauer commented 9 years ago

@gsrohde trying to upload this, I get the error "Your file contains 1 error. You can not upload your data set until this is corrected." But that doesn't seem correct ...

gsrohde commented 9 years ago

The problem is that the column name for specifying the site should be "site" not "sitename". Since the wizard doesn't recognize "sitename", it expects that you are going to specify one site interactively for the whole data set. But there is no one site that is associated with all of the citations in the dataset, hence the error message "There are no sites common to all the citations in the file." Also, you will see "sitename" in the list under the warning "These columns will be ignored."

Once you correct the column name, you will see warnings about two sites not being found. That is because while there are sites with "Wildehausen" and "Grabow" as city names, the sitename column in these two rows is blank. The simplest solution is just to use the city names for these two sites as the sitenames.

If you want me to take care of this and then upload the data to the live site, let me know and assign this back to me.

dlebauer commented 9 years ago

Please do, thanks!

gsrohde commented 9 years ago

Actually, the Wildehausen site was misspelled--it should have been Wildeshausen. (I changed the data sheet.) There are two Wildeshausen sites, but only one had that in the sitename column (the other had a blank sitename), and it had more complete data, so I'm using it.

I've uploaded this data set, but I'm assuming we are going to copy data from the new rows to the old ones as we did for previous sets from Emily. So, assuming this is the case, I need to know exactly what to copy and which columns of the old rows to leave alone. The scripts we used for doing this for the miscanthus set are here: https://gist.github.com/gsrohde/b4b4e12878364b05f5c6

dlebauer commented 9 years ago

From the old script, it looks like you only copied the following from the new rows:

Do you know why we didn't updated any changes to citation_id, site_id, or treatment_id (or am I missing something)?

gsrohde commented 9 years ago

I'm guessing that I first ran a script that checked what the differences between the old and the new rows were and found that none of the citations, sites, or treatments changed. This was the first set we did like this, so we were still figuring out exactly how to go about it.

dlebauer commented 9 years ago

Then we are about on the same page. If you can upload, please do.

gsrohde commented 9 years ago

I did upload. I want to know what more I need to do with this set.

dlebauer commented 9 years ago

At this point, I think the next step is to insert managements. I need to re-format the dataset from wide to long and then run `script/insert_managements.rb #288, #330

I'll work on that and let you know if I have any issues.

dlebauer commented 9 years ago

Do you know if we have actually used this script yet? (e.g. with switchgrass #263 or miscanthus #254, I think not)

gsrohde commented 9 years ago

Yeah, I don't think so. The script was written in late June and the only managements with a creation date later than May are two from yesterday.

gsrohde commented 9 years ago

OK, I've updated the old rows and deleted the new ones.

The scripts I used to do this, and a dump of the old rows before and after the updates is stored under my home directory on ebi-forecast at BETYdb_database_maintenance/salix_bulk_upload/SQL-fixes.