Princeton-LSI-ResearchComputing / tracebase

Mouse Metabolite Tracing Data Repository for the Rabinowitz Lab
MIT License
4 stars 1 forks source link

Update loading code to recognize `D_label` string, but store and present as `H` #736

Open jcmatese opened 1 year ago

jcmatese commented 1 year ago

FEATURE REQUEST

Inspiration

Recently saw some datafiles where the labeled element was D (as in deuterium)

Currently, we are expecting H, and if the data file is not edited by hand, it throws an error.

Description

Accept D in the data file, but store an H the database.

Alternatives

None

Dependencies

This issue cannot be started until the completion of the following issue(s)/ pull request(s):

Comment

Making some accommodations within search would be nice, but will be part of a separate, lower priority, effort.


ISSUE OWNER SECTION

Assumptions

  1. List of assumptions that the code will not explicitly address/check
  2. E.g. We will assume input is correct (explaining why there is no validation)

Limitations

  1. A list of things this work will specifically not do
  2. E.g. This feature will only handle the most frequent use case X

Affected Components

https://github.com/Princeton-LSI-ResearchComputing/tracebase/blob/80e8c3659951b3bca050ef3bb86d3694e537f6e5/DataRepo/utils/accucor_data_loader.py#L80-L93

https://github.com/Princeton-LSI-ResearchComputing/tracebase/blob/80e8c3659951b3bca050ef3bb86d3694e537f6e5/DataRepo/utils/accucor_data_loader.py#L230-L233

Requirements

DESIGN

Interface Change description

None provided

Code Change Description

None provided

Tests


Reading Accucor file:
tracebase-rabinowitz-data/mneinast_exp048a_carnosine/exp048a_Carn_pos_lowmz_corrected.xlsx
LOADING WITH PREFIX: None
  File "./manage.py", line 22, in <module>
    main()
  File "./manage.py", line 18, in main
    execute_from_command_line(sys.argv)
  File "DataRepo/management/commands/load_study.py", line 300, in handle
    call_command(
  File "DataRepo/management/commands/load_accucor_msruns.py", line 141, in handle
    loader.load_accucor_data()
  File "DataRepo/utils/accucor_data_loader.py", line 1295, in load_accucor_data
    self.aggregated_errors_object.buffer_error(e)
  File "DataRepo/utils/exceptions.py", line 886, in buffer_error
    self.buffer_exception(exception, is_error=True, is_fatal=is_fatal)
  File "DataRepo/utils/exceptions.py", line 845, in buffer_exception
    buffered_tb_str = self.get_buffered_traceback_string()
  File "DataRepo/utils/exceptions.py", line 829, in get_buffered_traceback_string
    for step in traceback.format_stack()

The above caught exception had a partial traceback:

  File "DataRepo/utils/accucor_data_loader.py", line 1273, in load_accucor_data
    self.preprocess_data()
  File "DataRepo/utils/accucor_data_loader.py", line 230, in preprocess_data
    self.labeled_element_header = self.accucor_corrected_df.filter(
  File ".venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4604, in __getitem__
    return getitem(key)
EXCEPTION1(ERROR): IndexError: index 0 is out of bounds for axis 0 with size 0
Traceback (most recent call last):
  File "./manage.py", line 22, in <module>
    main()
  File "./manage.py", line 18, in main
    execute_from_command_line(sys.argv)
  File ".venv/lib/python3.9/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
    utility.execute()
  File ".venv/lib/python3.9/site-packages/django/core/management/__init__.py", line 413, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File ".venv/lib/python3.9/site-packages/django/core/management/base.py", line 354, in run_from_argv
    self.execute(*args, **cmd_options)
  File ".venv/lib/python3.9/site-packages/django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "DataRepo/management/commands/load_study.py", line 338, in handle
    raise self.load_statuses.get_final_exception()
DataRepo.utils.exceptions.AggregatedErrorsSet: 1 categories had exceptions, including 1 in an error state and 0 in a warning state.
tracebase-rabinowitz-data/mneinast_exp048a_carnosine/exp048a_Carn_pos_lowmz_corrected.xlsx: AggregatedErrors Summary (1 errors / 0 warnings):
    EXCEPTION1(ERROR): IndexError: index 0 is out of bounds for axis 0 with size 0
Scroll up to see tracebacks for each exception printed as it was encountered.
hepcat72 commented 1 year ago

IMO, it should be H and the mass_number should be 2. (I'd always thought that the regular form of any element had the same number of neutrons as protons, but I had to look up that hydrogen's regular form doesn't have a neutron because it's too small to accommodate it, so the mass number of deuterium is not 3 as I would have assumed - it's 2.)

All the other elements we keep are represented as isotopes using their mass number alone, e.g. 14C as opposed to the regular 12C. Hydrogen is somewhat of a special case, as noted above, but I feel like handling the special case differently would cause some people to input 2D and some to input 2H, which would make it less searchable, so if we stick with 1 representation, we avoid complexity.

hepcat72 commented 1 year ago

Maybe we should accept D in the datafile, but the loader should change it to H in the database? Perhaps we should make similar accommodations in search and display(/templates)... In either case, there should be 1 representation in the DB. I know that the advanced search uses a select list for elements/labels and in that context, we could have the list item label display deuterium and tritium, etc in parens.

lparsons commented 1 year ago

I would agree that we should have one representation in the database, but we should recognize the D if that is what (El-)Maven exports. It might be nice to make some accommodations within search too, but that is lower priority that the loading code, imo.

hepcat72 commented 1 year ago

I would agree that we should have one representation in the database, but we should recognize the D if that is what (El-)Maven exports. It might be nice to make some accommodations within search too, but that is lower priority that the loading code, imo.

Not sure the reason for the "but". It sounds like you completely agree with me. And I agree that the interface representation is lower priority.