Tracer and infusates loader

FEATURE REQUEST

Inspiration

The animal/sample table loader is bloated, so it makes sense to break it up. This will also standardize the loaders to have a common interface with common file types to input.

Description

Create separate sheets for the tracer and infusate data, and create separate loaders to load them.

Alternatives

None

Dependencies

Parent issue-tracking issue:

Comment

I created an example version of the Study Excel doc:

study.xlsx

ISSUE OWNER SECTION

Assumptions

None

Limitations

None

Affected Components

change: load_animals_samples.py
change: sample_table_loader.py
add: load_tracers.py
add: tracer_loader.py

Requirements

Fleshed out a bit more than in #753:

~[ ] 1. Tracers must be loaded before infusates (i.e. exist in the DB)~^
~[ ] 2. Tracer name must be filled in by google doc function so it can be copied to the infusates tab~^
[x] 3. Either the tracer Name or the other columns in the Tracers tab can be filled in. If both are filled in, the default is to use the individual columns (not the name column)
~[ ] 4. Infusate name must be filled in by google doc function~^
[x] 5. Either the infusate Name or the other columns in the Infusates tab can be filled in. If both are filled in, the default is to use the individual columns (not the name column)

^ In reference to item 1, I determined that the load order will be addressed in a separate issue (mainly because I don't want to break main, but also because it fits into a logically different effort. Items 2 & 4 conflict with items 3 & 5. The spreadsheet very well may automatically populate the names, but the scripts can easily construct them.

The requirements from #753:

[x] 6. New loader scripts

[x] 6.1. Tracers Loader

[x] 6.2. Infusates Loader

[x] 8.6. Infusates Tab

[x] 8.6.1. Add Columns

[x] 8.6.1.1. Infusate Number

[x] 8.6.1.2. Tracer Group Name

[x] 8.6.1.3. Infusate Name

[x] 8.6.1.4. Tracer Number

[x] 8.6.1.5. Tracer Concentration

[x] 8.7. Tracers Tab

[x] 8.7.1. Add Columns

[x] 8.7.1.1. Tracer Number

[x] 8.7.1.2. Compound Name

[x] 8.7.1.3. Element

[x] 8.7.1.4. Mass Number

[x] 8.7.1.5. Label Count

[x] 8.7.1.6. Label Positions

[x] 8.7.1.7. Tracer Name (based on Study doc, Tracers tab contents)

DESIGN

Interface Change description

2 new load scripts.

The infusate records span multiple rows. For 1 infusate, there will be a row for each unique tracer and concentration combo.

Similarly, the tracer records will span multiple rows. For 1 tracer, there will be a row for each unique labeled element.

Code Change Description

Should be fairly straightforward, using other loaders as a template. Infusate number and tracer number will be used to identify distinct infusates/tracers, but those numbers will not be loaded into the DB.

Tests

A test for each requirement

Going with a unit tests strategy, though the load_tracers.py tests apply directly to the requirements.

DataRepo/loaders/table_loader.py (TableLoader())
- new methods
- [x] _get_pretty_headers_helper(reqd_headers, delim, _first_dim, _anded, markers)
- [x] get_missing_headers(supd_headers, reqd_headers, _anded, _first)
- [x] header_keys_to_names(ndim_header_keys, headers)
- [x] get_invalid_types_from_ndim_strings(ndim_strings)
- [x] flatten_ndim_strings(ndim_strings)
- [x] check_dataframe_values(reading_defaults)
- [x] get_missing_values(supd_val_headers, reqd_headers, _anded, _first)
- modified methods
- [x] get_pretty_headers(headers, markers, legend, reqd_only, reqd_spec, all_reqd) [prev version had no args]
- [x] check_unique_constraints(df) [added the df arg]
DataRepo/loaders/tracers_loader.py (TracersLoader())
- [x] init_load()
- [x] load_data()
- [x] build_tracer_dict()
- [x] load_tracer_dict()
- [x] get_row_data(row)
- [x] check_extract_name_data()
- [x] get_or_create_tracer(entry)
- [x] get_tracer(entry)
- [x] get_compound(compound_name)
- [x] create_tracer(compound_rec)
- [x] get_or_create_tracer_label(isotope_dict, tracer_rec)
- [x] parse_label_positions(positions_str)
- [x] check_data_is_consistent(tracer_number, compound_name, tracer_name)
- [x] buffer_consistency_issues()
- [x] check_tracer_name_consistent(rec, entry)
DataRepo/utils/exceptions.py (new exception classes)
- [x] InfileError(message, rownum, sheet, file, column)
- [x] CompoundDoesNotExist(name, rownum, sheet, file, column)
DataRepo/management/commands/load_tracers.py [custom option: positions string delimiter]
- [x] OK: Tracer names only
- [x] OK: Tracer column data only
- [x] OK: Mix of tracer name and column data
- [x] ERROR: Tracer name with multiple different numbers
- [x] ERROR: Tracer number with multiple different names
- [x] ERROR: Tracer number with multiple different compounds
- [x] OK: Duplicate compound, element, count, mass, and positions but with different numbers, and each tracer is associated with a different second isotope

Possible improvements.

I noted that with the tracer data, these class attributes:

[x] DataRequiredHeaders
[x] DataRequiredValues

...could be reconfigured to support conditionally required logic, e.g. either column 1 or 2 is (or both are) required. Might be an easy change.

Also, I noted that:

FieldToDataHeaderKey

...could feasibly map a single model field to multiple columns. The mapping is only used for error reporting of database errors, after all. The field types don't even have to be the same (e.g. "Tracer.compoundis a foreign key, and it can reference the "compound name" column). SoTracerLabel.name` could reference the columns "element", "count", "mass number", and "positions".

Though I will also point out that since Tracer.name and TracerLabel.name are both maintained fields that are controlled via autoupdate, they are prevented from being included in model object creations or calls to Model.objects.create(), so that is currently moot.

The infusates loader class has almost exactly the same allotment of tests as the tracers loader class.

The load_infusates.py tests:

DataRepo/management/commands/load_infusates.py [custom option: tracers string delimiter]
- [x] OK: Infusate names, numbers, and tracer concentrations only (numbers are required because concentrations are not embedded in the name (i.e. they are not on a single row - whereas, in the case of the tracers sheet, the isotopes were embedded in the name, and thus, no number was needed to link multiple rows))
- [x] OK: Infusate column data only (no infusate names)
- [x] OK: Mix of infusate name and column data
- [x] ERROR: Infusate name with multiple different numbers
- [x] ERROR: Infusate number with multiple different names
- [x] ERROR: Infusate number with multiple different tracer group names
- [x] ERROR: Tracer group name with different assortment of tracers
- [x] OK: Duplicate tracer and concentration, but with different numbers, and each tracer is associated with a different second tracer or concentration

Princeton-LSI-ResearchComputing / tracebase