UofS-Pulse-Binfo / analyzedphenotypes

Tripal/Drupal support for analyzed phenotypic data including data loaders, exporters, trait pages and summaries on germplasm pages.
GNU General Public License v2.0
0 stars 1 forks source link

Trait - Method - Unit Data Storage: Support during Upload #46

Closed laceysanderson closed 5 years ago

laceysanderson commented 5 years ago

Metadata

Documentation:

Description

We should store our trait data according to the Trait - Method - Unit Data Storage Method. In this method, each recorded phenotypic value is directly connected to the trait, method and units used to measure it. This is needed to ensure that measurement of traits using different methods is correctly differentiated and supported.

Testing?

Functional Testing

  1. Create a project (Admin » Add Tripal Content » Project)
  2. Create germplasm for your file
    $species = 'databasica';
    for ($i=1; $i <= 15; $i++) {
    $stock = [
     'name' => 'GERM'.$i,
     'uniquename' => 'ID:'.$i,
     'type_id' => ['name' => 'SNP'],
     'organism_id' => ['species' => $species],
    ];
    $stock = chado_insert_record(
    'stock',
     $stock
    );
    }
  3. Configure the module for your organism (Administration » Tripal » Extensions » Analyzed Phenotypes » Configuration) by selecting at least a "Trait Vocabulary" and "Associated DB". Which you choose doesn't matter for testing purposes other then to ensure the trait is added to the vocabulary when loaded. Documentation: https://analyzedphenotypes.readthedocs.io/en/latest/admin_guide/configuration.html
  4. Test Upload (Administration » Tripal » Extensions » Analyzed Phenotypes » Upload Phenotypic Data). Documentation: https://analyzedphenotypes.readthedocs.io/en/latest/user_guide/loading.html

How to pull out the results from the database:

Using Execute PHP Code ([yoursite]/devel/php):

    $debug_query =  "
      SELECT json_build_object(
        'phenotype_id', p.phenotype_id,
        'trait_id', p.attr_id,
        'method_id', p.assay_id,
        'unit_id', p.unit_id,
        'stock_id', p.stock_id,
        'project_id', p.project_id,
        'value', p.value,
        'properties', array_to_json(array_agg(props.json))
      ) as json
      FROM chado.phenotype p
        LEFT JOIN (
        SELECT row_to_json(prop) as json, phenotype_id
        FROM chado.phenotypeprop prop
      ) props ON props.phenotype_id=p.phenotype_id
      GROUP BY p.phenotype_id, p.attr_id, p.stock_id, p.project_id, p.value";
    $results = chado_query($debug_query)->fetchAll();
    foreach($results as $result) {
      $noSlahes = str_replace('\\', '', $result->json);
      $jsonObject = json_decode($noSlahes);
      dpm($jsonObject);
    }

This will print each phenotype record with associated property records.

Automated Testing

  1. Tests module install
  2. Tests data integrity after data upload:
    • analyzedphenotypes/tests/example_files/AnalyzedPhenotypes-TestData-1trait3loc2yr3rep.txt is loaded using analyzedphenotypes_save_tsv_data() directly.
    • The module is configured and project, trait and stocks are created before the test is run (due to the design of the function).
    • For every line of the file, it is confirmed that phenotype record was created correctly with the trait, method, unit, project, stock and value combination expected. Additionally, it is confirmed that 4 meta-data records (stockprop; location, year, replicate and data collector) where added.
  3. A Phenotype database Seeder is added for easier testing in the future and an associated test ensures that the data created by the seeder matches the expected format as was done for upload.
  4. A Download test has been started but not completed.
reynoldtan commented 5 years ago

Updates work as described, API in its own file which follows overall module API and file structure. Confirmed Trait-Method-Unit data were created and relationships properly defined in cvterm_relationship. Was not expecting that Genus could be numerous, in my head they're just under 7 thingies :) 👍

Comments:

  1. Throughout the module, all array definition uses: $var = array(element1 => element1 value, ...) array constructs. For consistency, I would suggest to convert short array syntax [] in the following sections: @file include/analyzedphenotypes.trait.api.inc @file include/analyzedphenotypes.uploaddata.page.inc line #857.

  2. The insert trait api uses a hard coded name for "unit" and "method". The term "method" is currently available in the configuration page as a system variable. I believe we need to include additional term for "unit". In addition, an API to fetch this system variable is available in sysvars.api.

Other than these two, this PR is good for merged. @laceysanderson #2, I can include this change when I go over the module to test, change or update the following:

  1. Validators - It checks if traits in Trait (unit) format exist that might always fail since we have a new way of saving trait.
  2. Suggest Similar Trait in describe stage - In my test, it suggested a trait without the unit part.
  3. Saving in last stage
  4. Summary page and data visualizer - shows only trait names.
  5. Trait describe counter in describe stage - With new field elements (method of collection) counter does not increment having entered a value in this field.
  6. Data download - Trait does not contain the unit part in the exported file.
  7. Noticed several unused block of codes.
laceysanderson commented 5 years ago

Thanks for the review @reynoldtan! I will fix the items you pointed out 👍

carolyncaron commented 5 years ago

It brings me great sadness to say this, but I am struggling to get anything uploaded to the system... 😭 I am testing on dev/B, and have verified the following:

  1. Switched to branch: 40-data_storage
  2. Successfully created my own project: Carolyn's Awesome Project
  3. Germplasm added through the interface (GERM1 through to GERM15) are confirmed to exist in the chado.stock table
  4. I've configured "Trait Vocabulary" and "Associated DB" for Tripalus databasica (I also added this as an organism for testing and it shows up in the config)
  5. The format of my input file matches that of the example file for automated testing (sadly the user guide doesn't mention anything about format!)

When I go to the Upload Phenotypic Data page, I am able to select my project, genus and drag my file and the inital validation passes. Step 2, I run the job and everything looks good. For Stage 3, I have tried both adding new traits "Awesome trait (awesome unit)" and using pre-existing traits by changing my file to have "Plant height (cm)" instead; regardless, at Stage 4 I receive the following error when running my job (and eventually, an ajax error appears in place of the loading bar).

2018-11-22 15:38:12: Calling: analyzedphenotypes_save_tsv_data(a:6{s:12:"project_name";s:25:"Carolyn's Awesome Project";s:13:"project_genus";s:8:"Tripalus";s:9:"data_file";s:4:"2100";s:12:"trait_cvterm";a:1:{s:28:"Awesome trait (awesome unit)";s:4:"6368";}s:13:"method_cvterm";a:1:{s:28:"Awesome trait (awesome unit)";s:4:"6369";}s:11:"unit_cvterm";a:1:{s:28:"Awesome trait (awesome unit)";s:4:"6370";}}) 0% Complete... 0% Complete... WD : PDOException: SQLSTATE[42703]: Undefined column: 7 ERROR: column "unit_id" of relation "phenotype" does not exist [error] LINE 1: ...pe (uniquename, observable_id, attr_id, assay_id, unit_id, v... ^: INSERT INTO chado.phenotype (uniquename, observable_id, attr_id, assay_id, unit_id, value, cvalue_id, project_id, stock_id) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8); Array ( ) in analyzedphenotypes_save_tsv_data() (line 996 of /var/www/dev/B/sites/all/modules/custom/analyzedphenotypes/include/analyzedphenotypes.validators.inc). CRITICAL (): CODE:107 Failed to insert records. [VALUE: Tripal Job]

[site http://default] [TRIPAL CRITICAL] [] CODE:107 Failed to insert records. [VALUE: Tripal Job] CRITICAL (): Failed to load phenotypic data in Tripal Job #12891. [site http://default] [TRIPAL CRITICAL] [] Failed to load phenotypic data in Tripal Job #12891. Drush command terminated abnormally due to an unrecoverable error.

The INSERT INTO pointer is directly under "unit_id"

Also, I noticed when choosing a pre-existing trait in Step 3, "Collection Method" is blank despite having filled it out when I originally added the trait. I'm not sure if this is a result of the data storage or a process of pulling out the information which may be fixed down the road anyway (regarding the updates that Reynold mentioned).

Have I configured something wrong? Or is there something obvious that I'm missing??? 😞

carolyncaron commented 5 years ago

Lacey suggested that I was testing on a KP clone that still had the original table structure defined and this was causing my problems. Once I re-installed the module, the above issue was fixed 👍

carolyncaron commented 5 years ago

My file is now able to be uploaded, but I've been having issues pulling the results out using the query provided. We've narrowed down this issue to likely being due to quotes being around the location values in my input file. For some reason, the quotes also end up in the database and this breaks the json formatting. Oops 😬

I then created a new project (Carolyn's Second Project) so that I can continue testing and pull out results without breaking json (but leave the current quotes intact for investigation purposes). Now I'm seeing the following error in Stage 2:

A value does not match the data type expected in items: in Trait Name line #2 (Mediocre Trait is not a valid trait)

On the terminal I get:

2018-11-28 09:11:32: Calling: analyzedphenotypes_validate_tsv_data(a:3:{s:12:"project_name";s:24:"Carolyn's Second Project";s:13:"project_genus";s:8:"Tripalus";s:9:"data_file";s:4:"2114";}) 0% Complete... 25% Complete... Error limit reached. Job terminated. 100% Complete...

Finally, I ran the automated tests and I get 3 errors and 1 warning which all appear to be related. Each one shares the following PDOException:

PDOException: SQLSTATE[42703]: Undefined column: 7 ERROR: column "type_id" of relation "organism" does not exist LINE 1: ...anism (abbreviation, genus, species, common_name, type_id) V... ^

_The pointer is below "typeid"

laceysanderson commented 5 years ago

Finally, I ran the automated tests and I get 3 errors and 1 warning which all appear to be related. Each one shares the following PDOException:

PDOException: SQLSTATE[42703]: Undefined column: 7 ERROR: column "type_id" of relation "organism" does not exist
LINE 1: ...anism (abbreviation, genus, species, common_name, type_id) V...
^

The pointer is below "type_id"

This error was due to a non-standard chado on the dev site.

laceysanderson commented 5 years ago

From @reynoldtan

Validators - It checks if traits in Trait (unit) format exist that might always fail since we have a new way of saving trait.

Now fixed

Suggest Similar Trait in describe stage - In my test, it suggested a trait without the unit part.

unit and collection method are incorrect

Saving in last stage

Works 👍

Summary page and data visualizer - shows only trait names.

Data visualizer should perhaps show unit? and ensure not to collapse different units. This is an issue for another PR: #32

Trait describe counter in describe stage - With new field elements (method of collection) counter does not increment having entered a value in this field.
Data download - Trait does not contain the unit part in the exported file.

Also for another PR: #49

From @carolyncaron

My file is now able to be uploaded, but I've been having issues pulling the results out using the query provided. We've narrowed down this issue to likely being due to quotes being around the location values in my input file. For some reason, the quotes also end up in the database and this breaks the json formatting.

I checked this and with single quotes, they do not seem to make it into the database. I will test multiple quotes and escape any internal quotes.

I'm seeing the following error in Stage 2: A value does not match the data type expected in items: in Trait Name line #2 (Mediocre Trait is not a valid trait)

This is now fixed (same as Check validators from @reynoldtan).

Thus the unfinished items are:

  1. Suggest Similar Trait in describe stage: unit and collection method are incorrect
  2. Ensure quotes never end up in the database un-escaped.
laceysanderson commented 5 years ago

The above problems are now being handled correctly.

New problem:

laceysanderson commented 5 years ago

To my knowledge all existing issues are fixed 🤞 This is now ready for review! 🎉

reynoldtan commented 5 years ago

Updates would not let me upload any data in the system however all stages but the last stage (Save) work as expected with only small and minor issue particularly in the describe trait stage.

  1. An error when selecting the option "None of these apply". none-applies
  2. Trait name field contains the unit part that may cause confusion when entering the unit. In addition, the unit field should be prefilled with the unit should user decides not to enter other unit.
  3. Autocomplete unit field should be based on human readable and non abbreviated full name when generating suggestions since label indicates to enter 'full name of the unit'. Entering centimeter will not suggest any unit.
  4. Reusing an existing trait allows altering of 'collection method' and 'unit'. field-is enabled-reusing-trait
  5. Mean distribution summary table is empty. summary-table-empty
  6. Save error. save-error This should be a quick fix by applying same trimming (space and quotes) applied to a row in saving data (stage 4) to that in saving trait names (stage 3). @see line #400 of upload.page.inc and line #914 of validators.inc.

Other comments/enhancements:

  1. As pointed out by @carolyncaron, update user guide to include information about expected file format. I would further suggest to include a sample blank data collection file in both tsv and txt format.
  2. To silence or mute AJAX error with server resource, debugging info and etc we need set alert function to return void.
  3. Need additional configuration for terms: UO, Name, Method and Unit.
  4. My apologies I've overlooked this section in the previous PR request. Functions in traits.api use select chado record to fetch cvterm. A built in function (we both designed and gone lengthy trail and error) to fetch cvterm is available in cv.api. For example:

// Search for an exact match of the trait nae in the expected cv. $result = chado_select_record('cvterm', array('cvterm_id'), array('cv_id' => $cv_id, 'name' => $trait_name));

=

// Returns cvterm id number. $result = ap_get_cvterm( array('name' => $trait_name, 'cv_id' => $cv_id) );

// Return cvterm id + other prop $result = ap_get_cvterm( array('name' => $trait_name, 'cv_id' => $cv_id), array('dataset' => 'fullset') );

In addition to getting terms, inserting terms is also available.

  1. What appears to be tab (4 spaces) used to indent lines in functions ap_get_mthod_id() and ap_get_unit_id().
  2. Short array syntax in function parameter ($option=[] in ap_get_method_units() and ap_get_trait_methods()).

Please let me know if #6 is solved so I can test other functionality (summary and exporter). Thank you!

reynoldtan commented 5 years ago

My first upload went smooth, no errors and I was able to visualize it as well as download. I attempted to upload a second dataset with the same trait name (no unit) and tried both entering a non-existing unit and one suggested by unit field and in both trials I got this error in the terminal

screen shot 2018-12-06 at 11 37 50 am

Here is my data file. AP_R7_Trait_324-no-unit.txt

Test done in dev/B

Download and Summary chart both work without a glitch 👍 👍.

laceysanderson commented 5 years ago

That bug is fixed @reynoldtan. It should be ready for review again 🤞

reynoldtan commented 5 years ago

Tested and confirmed to work:

  1. A validation error will be triggered when traits have unit. screen shot 2018-12-07 at 8 55 30 am 2

  2. Trait had multiple unit and/or method of collection: screen shot 2018-12-07 at 8 59 16 am 2

An enhancement to this would be to turn unit or collection method suggestions into an active link that when clicked on would fill in the related field, in the same way when choosing ontology term.

  1. Uploads
    • New trait
    • Same trait, selected a suggested trait, entered a new collection method and used existing unit.
    • Same trait, selected a suggested trait, used existing unit and entered new collection method.
    • Same trait, selected a suggested trait, used suggested collection method and unit.
    • Same trait, selected 'None of these apply' option.
    • Multi traits (15 traits)

Errors: 1. screen shot 2018-12-07 at 9 10 46 am 2 Variable array dump on top of the page appears and sometimes not in the last stage.

2. screen shot 2018-12-07 at 12 29 01 pm 2 Clicking next does not trigger a Drupal form error and goes to an unfriendly page.

3. screen shot 2018-12-07 at 10 55 10 am File with missing expected column headers got a check in validation. This one's on me @laceysanderson. Please update analyzedphenotypes_validator_datafile() validator in validators.inc for case 'columns', Line #380 if (!empty($column_row)) { should be $columns_row (plural) and just after the open curly brace of this if, Line #381 we need this line $file_columns = str_getcsv($columns_row, "\t");

4. screen shot 2018-12-07 at 12 55 11 pm 2 I got instructed to copy a unit or collection method but with no text options to copy.

laceysanderson commented 5 years ago

To my knowledge all errors mentioned above in review are fixed 🎉 Back to you @reynoldtan

reynoldtan commented 5 years ago

👍 Confirmed column headers validator screen shot 2018-12-17 at 8 40 49 am 2

👍 Variable dump in stage 4 👍 Nice error screen shot 2018-12-17 at 9 02 30 am 2 👍 Not required help text screen shot 2018-12-17 at 9 19 45 am 2