Bulk upload not accepting certain variables

abstewa2 commented 8 years ago

When I try to upload this file:

To the terra-test instance (http://pecandev.igb.illinois.edu/terra-test) of BETYdb, it states that many of the data column numbers are out of range for that particular variable. I get the following:

Also, in order for the cultivars to be recognized, the cultivar numbers from 9915 through 10414 will need to be entered into the system. I would manually enter them myself but I was certain there was mention that they could easily be uploaded rapidly another way.

dlebauer commented 8 years ago

@abstewa2 I've added the cultivars and changed the allowable range of cuticular_cond (to min = 900) so you can try again.

@gsrohde

do you know why the leafT is marked as out of range?
how do we allow additional traits to be uploaded? For this dataset we need (quantum_efficiency, growth_respiration_coefficient, chi_leaf, and extinction_coefficient_diffuse), but will need to add additional values as they come along.

abstewa2 commented 8 years ago

@dlebauer The cultivars worked.

However the other values for the variables are still out of range.

gsrohde commented 8 years ago

@dlebauer Regarding leafT being out of range, there's a bug in the bulk upload software whereby a max value of 'Infinity' (which is what we set max to whenever it was unspecified) becomes 0.0 when converted to a float. I'll make a bug report for this, but for now I've changed the max column value for leafT to '60' in the terra-test database.

For a trait variable to be recognized in doing a bulk upload, the trait_covariate_associations table must contain at least one row having that variable's id as the value of the trait_variable_id column. Since I think was said that "age" should be an optional covariate for every trait variable, you can use 343 (the id of variable "age") as the value in the covariate_variable_id column and 'f' as the value in the required column.

(Note that from what I remember, a given variable can't be used as both a trait variable and a covariate.)

gsrohde commented 8 years ago

The only bug here is given in a separate issue, #387. Closing.

dlebauer commented 8 years ago

actually for these specific traits, it would work to set reasonable max and min values in the variables. However, this should be done at betydb.org. I will go ahead and set them on both instances so that you can upload this dataset.

So I've done this:

update variables set min = 0, max = 1, updated_at = now() 
    where name in ('quantum_efficiency', 'growth_respiration_coefficient');
update variables set min = 0, max = 100, updated_at = now() 
    where name = 'chi_leaf';

For the extinction_coefficient_diffuse, the min / max are already set appropriately to [0, 1] so perhaps it is because the header is written in ALL CAPS and the bulk upload is CaSe SEnsiTive. So go ahead and try again and close this issue when it is fixed.

gsrohde commented 8 years ago

@dlebauer Are you wanting me to do this or @abstewa2? And is it only pecandev/terra-test that is to be uploaded to?

Regarding extinction_coefficient_diffuse, it will be ignored unless it is added to the trait_covariate_associations table (either as a primary trait or as a required or optional covariate for a trait) regardless of the case. Several other columns (namely, quantum_efficiency, growth_respiration_coefficient, and chi_leaf) are ignored as well for this same reason.

dlebauer commented 8 years ago

Sorry I forgot about that. I will do this

dlebauer commented 8 years ago

I've updated these on both ebi_production (betydb.org) and terra-test (on pecandev)

INSERT INTO trait_covariate_associations (
    trait_variable_id,
    covariate_variable_id,
    required
)
VALUES
    (39, 343, 'f');

INSERT INTO trait_covariate_associations (
    trait_variable_id,
    covariate_variable_id,
    required
)
VALUES
    (492, 343, 'f');

INSERT INTO trait_covariate_associations (
    trait_variable_id,
    covariate_variable_id,
    required
)
VALUES
    (568, 343, 'f');

INSERT INTO trait_covariate_associations (
    trait_variable_id,
    covariate_variable_id,
    required
)
VALUES
    (493, 343, 'f');

abstewa2 commented 8 years ago

I tried the upload again just now and everything looks great except what I assume Scott mentioned about the extinction_coefficient_diffuse column.

This is the screen I receive on: http://pecandev.igb.illinois.edu/terra-test/bulk_upload/display_csv_file

Also, when bulk uploading, lately I have noticed the server runs pretty slowly. it will take 10-15 minutes sometimes for the Pecan website to finally upload a file. I will try connecting an Ethernet cord to see if this solves this issue but everything else on the computer runs very efficiently so I don't think it is an internet issue. I'm not sure if this is something that can be solved or not, however I figured it was worth mentioning as a side note.

dlebauer commented 8 years ago

I've reduced the min valid value for cuticular_cond that was causing the 'out of range' error

@gsrohde do you know why the extinction_coefficient_diffuse trait is not recognized here? It is in the trait_covariate_associations table [1]. Not sure if it is related, but the trait name in the web interface shows up as ALL CAPS whereas it is lowercase in the variables table and the upload spreadsheet.

[1]select name from variables where id in (select distinct trait_variable_id from trait_covariate_associations);

gsrohde commented 8 years ago

@dlebauer, @abstewa2 The variable names are case-sensitive, so EXTINCTION_COEFFICIENT_DIFFUSE is not going to be recognized if the trait_covariate_associations table has it as extinction_coefficient_diffuse.

@abstewa2 I too have noticed that the validation stage of the bulk upload script runs very slowly on pecandev. The top command shows 97+% of CPU being used by ruby while the validation script is running. I've noticed slowness before, but this seems more than usual. I think perhaps it is because this file has so many columns of variables and they all get checked to see that they are in range. But I'd have to profile the script to see if this is what is really going on. It also seems that pecandev is slower than either my local machine or the production machine.

dlebauer commented 8 years ago

@gsrohde the web interface shows the EXTINCTION_COEFFICIENT_DIFFUSE in all caps even though the csv file has the column name in lowercase.

@abstewa2 could you please upload the file that causes the error?

abstewa2 commented 8 years ago

Sure. Hold on one sec

abstewa2 commented 8 years ago

@dlebauer @gsrohde

phenotypes final.txt

this is the file that is causing the errors. In the excel sheet the extinction_coefficient_diffuse variable is lowercase, however when I upload it into the bulk upload, Pecan capitalizes it for some reason.

gsrohde commented 8 years ago

@abstewa2 It is probably capitalized by Excel when you export the spreadsheet as a raw text file. If you are certain it is NOT capitalized at the time you select it in the Bulk Upload wizard and that the Bulk Upload process is capitalizing it, assign this back to me and I'll look into this. If this issue is done, please close it.

dlebauer commented 8 years ago

@gsrohde the text file exported by Excel does not have capital letters. @abstewa2 uploaded it in her comment above; and this is the link: phenotypes final.txt

It is comma separated, despite the .txt file extension

gsrohde commented 8 years ago

@abstewa2 , @dlebauer

There was a bug in the header-normalization routine. It's fixed now on terra-test, so the upload should work.

abstewa2 commented 8 years ago

The problem is that the server is going so slowly now. Even when I upload a shortened version of the file, the webpage loads for really long times. It's been 20 minutes that I have been waiting on this shortened file in Pecandev to upload. I will wait for this one for a little longer and if it doesn't upload I will submit another issue.

abstewa2 commented 8 years ago

The bulk upload did in fact work this time.

gsrohde commented 8 years ago

@abstewa2 Please close this if you think this issue is resolved. Make a new issue for the bulk upload running slowly if you see fit to do so. You can assign it to me.

gsrohde commented 8 years ago

The header-normalization bug was fixed by Release 4.6. The out-of-range bug (issue #387) was fixed by Release 4.8.

@abstewa2 Please close this issue and if appropriate, make a new issue for the bulk upload running slowly.

PecanProject / bety

Bulk upload not accepting certain variables #386