Open hepcat72 opened 6 months ago
Going with a unit tests strategy, though the load_tracers.py
tests apply directly to the requirements.
DataRepo/loaders/table_loader.py
(TableLoader()
)
_get_pretty_headers_helper(reqd_headers, delim, _first_dim, _anded, markers)
get_missing_headers(supd_headers, reqd_headers, _anded, _first)
header_keys_to_names(ndim_header_keys, headers)
get_invalid_types_from_ndim_strings(ndim_strings)
flatten_ndim_strings(ndim_strings)
check_dataframe_values(reading_defaults)
get_missing_values(supd_val_headers, reqd_headers, _anded, _first)
get_pretty_headers(headers, markers, legend, reqd_only, reqd_spec, all_reqd)
[prev version had no args]check_unique_constraints(df)
[added the df arg]DataRepo/loaders/tracers_loader.py
(TracersLoader()
)
init_load()
load_data()
build_tracer_dict()
load_tracer_dict()
get_row_data(row)
check_extract_name_data()
get_or_create_tracer(entry)
get_tracer(entry)
get_compound(compound_name)
create_tracer(compound_rec)
get_or_create_tracer_label(isotope_dict, tracer_rec)
parse_label_positions(positions_str)
check_data_is_consistent(tracer_number, compound_name, tracer_name)
buffer_consistency_issues()
check_tracer_name_consistent(rec, entry)
DataRepo/utils/exceptions.py
(new exception classes)
InfileError(message, rownum, sheet, file, column)
CompoundDoesNotExist(name, rownum, sheet, file, column)
DataRepo/management/commands/load_tracers.py
[custom option: positions string delimiter]
Possible improvements.
I noted that with the tracer data, these class attributes:
DataRequiredHeaders
DataRequiredValues
...could be reconfigured to support conditionally required logic, e.g. either column 1 or 2 is (or both are) required. Might be an easy change.
Also, I noted that:
FieldToDataHeaderKey
...could feasibly map a single model field to multiple columns. The mapping is only used for error reporting of database errors, after all. The field types don't even have to be the same (e.g. "Tracer.compoundis a foreign key, and it can reference the "compound name" column). So
TracerLabel.name` could reference the columns "element", "count", "mass number", and "positions".
Though I will also point out that since Tracer.name
and TracerLabel.name
are both maintained fields that are controlled via autoupdate, they are prevented from being included in model object creations or calls to Model.objects.create()
, so that is currently moot.
The infusates loader class has almost exactly the same allotment of tests as the tracers loader class.
The load_infusates.py
tests:
DataRepo/management/commands/load_infusates.py
[custom option: tracers string delimiter]
FEATURE REQUEST
Inspiration
The animal/sample table loader is bloated, so it makes sense to break it up. This will also standardize the loaders to have a common interface with common file types to input.
Description
Create separate sheets for the tracer and infusate data, and create separate loaders to load them.
Alternatives
None
Dependencies
Parent issue-tracking issue:
753
Comment
I created an example version of the Study Excel doc:
study.xlsx
ISSUE OWNER SECTION
Assumptions
None
Limitations
None
Affected Components
load_animals_samples.py
sample_table_loader.py
load_tracers.py
tracer_loader.py
Requirements
Fleshed out a bit more than in #753:
1.
Tracers must be loaded before infusates (i.e. exist in the DB)~^2.
Tracer name must be filled in by google doc function so it can be copied to the infusates tab~^3.
Either the tracer Name or the other columns in the Tracers tab can be filled in. If both are filled in, the default is to use the individual columns (not the name column)4.
Infusate name must be filled in by google doc function~^5.
Either the infusate Name or the other columns in the Infusates tab can be filled in. If both are filled in, the default is to use the individual columns (not the name column)^ In reference to item 1, I determined that the load order will be addressed in a separate issue (mainly because I don't want to break main, but also because it fits into a logically different effort. Items 2 & 4 conflict with items 3 & 5. The spreadsheet very well may automatically populate the names, but the scripts can easily construct them.
The requirements from #753:
DESIGN
Interface Change description
2 new load scripts.
The infusate records span multiple rows. For 1 infusate, there will be a row for each unique tracer and concentration combo.
Similarly, the tracer records will span multiple rows. For 1 tracer, there will be a row for each unique labeled element.
Code Change Description
Should be fairly straightforward, using other loaders as a template. Infusate number and tracer number will be used to identify distinct infusates/tracers, but those numbers will not be loaded into the DB.
Tests
A test for each requirement