Closed jcmatese closed 8 months ago
Just a quick FYI, someone can go ahead and merge https://github.com/PrincetonUniversity/tracebase-rabinowitz-data/pull/98 if that will help workaround this issue while we work on getting it fixed.
What I mean to say is all those lines probably have the same category of aggregated error, but the reporting/association of the data might be incorrect?
Yes, I can workaround it by dropping my local database, but I just wanted to report it, as it will likely popup in all the loaders (if they all inherit/use that aggregated error reporting)
The row numbers not being accurate is a known and documented minor issue that will be fixed in the current refactor. The error however is correct. It should also reduce the error summarization so that it doesn't spit out so many lines about the same issue.
The question is whether example data with different case has contaminated your database or whether the case difference is in the file you're loading.
So does "O-phosphohomoserine" exist in your input file? If not, I suspect leftovers from a previous differing load.
o-phosphohomoserine
is line 544 of that file. That is the correct error report for that line. The issues that the other lines are also reporting o-phosphohomoserine, but they should probably be reporting other data like
o-acetylserine C5H9NO4 HMDB0003011
o-cresol C7H8O HMDB0002055
o-phosphoethanolamine C2H8NO4P HMDB0000224
Presumably also with case differences...
OK. That could be correct. We just spoke. To summarize, I think you may be right and the reported lines do have errors, but I think that it's just the summarization code that takes all the buffered ConflictingValueError
objects and puts them into a single ConflictingValueErrors
object. Thanks for the clarification.
A minor bug. Should be a quick fix. And I should be able to fix that shortly. But as long as we're aware of what's happening, you should be able to proceed.
And I suspect that perhaps this came about because you loaded data, edited it for case, and loaded again. That's when you would run into these conflicting value errors.
I have this reporting error fixed. Found a few additional related minor bugs having to do with stats reporting.
New output will look like this:
DataRepo.utils.exceptions.AggregatedErrors: 2 exceptions occurred, including type(s): [ConflictingValueErrors, DuplicateValueErrors].
AggregatedErrors Summary (2 errors / 0 warnings):
EXCEPTION1(ERROR): ConflictingValueErrors: Conflicting values encountered during loading:
During the processing of file [/Users/rleach/Temporary/compounds_2wksago.tsv]...
Creation of the following Compound record(s) encountered conflicts:
File record: {'name': 'M-aminobenzoic acid', 'formula': 'C7H7NO2', 'hmdb_id': 'HMDB0001891'} (on rows: 464)
Database record: {'id': 911, 'name': 'm-aminobenzoic acid', 'formula': 'C7H7NO2', 'hmdb_id': 'HMDB0001891'}
[name] values differ:
- database: [m-aminobenzoic acid]
- file: [M-aminobenzoic acid]
File record: {'name': 'M-coumaric acid', 'formula': 'C9H8O3', 'hmdb_id': 'HMDB0001713'} (on rows: 465)
Database record: {'id': 912, 'name': 'm-coumaric acid', 'formula': 'C9H8O3', 'hmdb_id': 'HMDB0001713'}
[name] values differ:
- database: [m-coumaric acid]
- file: [M-coumaric acid]
File record: {'name': 'M-cresol', 'formula': 'C7H8O', 'hmdb_id': 'HMDB0002048'} (on rows: 466)
Database record: {'id': 913, 'name': 'm-cresol', 'formula': 'C7H8O', 'hmdb_id': 'HMDB0002048'}
[name] values differ:
- database: [m-cresol]
- file: [M-cresol]
File record: {'name': 'monoacylglycerol NA(22:4)', 'formula': 'C25H41O4Na', 'hmdb_id': 'FakeHMDB050'} (on rows: 494)
Database record: {'id': 941, 'name': 'monoacylglycerol Na(22:4)', 'formula': 'C25H41O4Na', 'hmdb_id': 'FakeHMDB050'}
[name] values differ:
- database: [monoacylglycerol Na(22:4)]
- file: [monoacylglycerol NA(22:4)]
File record: {'name': 'O-acetylserine', 'formula': 'C5H9NO4', 'hmdb_id': 'HMDB0003011'} (on rows: 543)
Database record: {'id': 990, 'name': 'o-acetylserine', 'formula': 'C5H9NO4', 'hmdb_id': 'HMDB0003011'}
[name] values differ:
- database: [o-acetylserine]
- file: [O-acetylserine]
File record: {'name': 'O-cresol', 'formula': 'C7H8O', 'hmdb_id': 'HMDB0002055'} (on rows: 544)
Database record: {'id': 991, 'name': 'o-cresol', 'formula': 'C7H8O', 'hmdb_id': 'HMDB0002055'}
[name] values differ:
- database: [o-cresol]
- file: [O-cresol]
File record: {'name': 'O-phosphoethanolamine', 'formula': 'C2H8NO4P', 'hmdb_id': 'HMDB0000224'} (on rows: 545)
Database record: {'id': 992, 'name': 'o-phosphoethanolamine', 'formula': 'C2H8NO4P', 'hmdb_id': 'HMDB0000224'}
[name] values differ:
- database: [o-phosphoethanolamine]
- file: [O-phosphoethanolamine]
File record: {'name': 'O-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} (on rows: 546)
Database record: {'id': 993, 'name': 'o-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'}
[name] values differ:
- database: [o-phosphohomoserine]
- file: [O-phosphohomoserine]
EXCEPTION2(ERROR): DuplicateValueErrors: The following unique column(s) (or column combination(s)) were found to have duplicate occurrences on the indicated rows:
file [/Users/rleach/Temporary/compounds_2wksago.tsv]
Column(s) ['HMDB ID']
HMDB0000143 (rows*: 227, 330)
HMDB0000283 (rows*: 233, 606)
Column(s) ['Synonyms']
NA (rows*: 154, 158)
C (rows*: 211, 220)
Scroll up to see tracebacks for these exceptions printed as they were encountered.
And I would like to point out that errors above lines that start with "AggregatedErrors Summary
" are:
AggregatedErrors
exception (the immediate trace above that line)AggregatedErrors Summary
", i.e. the trace at the time each error was buffered. Those traces only serve the utility of debugging the code. There's no reason to look at them if you are debugging erroneous data. All the relevant information should be contained in the summary, unless there is a bug.
BUG DESCRIPTION
Loading compounds to an existing/pre-loaded database is throwing errors because of differing compound names, but the line number to data content seems to be mismatched? (last line encountered may be overwriting prior offending data?)
Problem
executed `python manage.py load_compounds --infile tracebase-rabinowitz-data/compounds/compounds.tsv` The error is below. What you can see is that the "same" name(s) are throwing the "same" errors, but at different indexed lines. So `o-phosphohomoserine` != `O-phosphohomoserine` but claimed at rows 462-464, 492, 541-544. However, if you go to those lines, there are different data, and o-phosphohomoserine is just the last row.name encountered (example and full error, below). ![Screenshot 2024-02-19 at 1 42 12 PM](https://github.com/Princeton-LSI-ResearchComputing/tracebase/assets/6091114/9fb37528-ae08-40ab-88dd-f33b70b36bbf) ![Screenshot 2024-02-19 at 1 42 36 PM](https://github.com/Princeton-LSI-ResearchComputing/tracebase/assets/6091114/ab3ad1d3-fe8f-4f95-aee5-900813507902) ``` Compound records loaded: [0], skipped: [0], and errored: [8].CompoundSynonym records loaded: [0], skipped: [0], and errored: [0]. AggregatedErrors Summary (1 errors / 0 warnings): EXCEPTION1(ERROR): ConflictingValueErrors: Conflicting values encountered during loading: During the processing of row [462] in file [tracebase-rabinowitz-data/compounds/compounds.tsv]... Creation of the following Compound record(s) encountered conflicts: File record: {'name': 'o-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} Database record: {'id': 1234, 'name': 'O-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} [name] values differ: - database: [O-phosphohomoserine] - file: [o-phosphohomoserine] During the processing of row [463] in file [tracebase-rabinowitz-data/compounds/compounds.tsv]... Creation of the following Compound record(s) encountered conflicts: File record: {'name': 'o-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} Database record: {'id': 1234, 'name': 'O-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} [name] values differ: - database: [O-phosphohomoserine] - file: [o-phosphohomoserine] During the processing of row [464] in file [tracebase-rabinowitz-data/compounds/compounds.tsv]... Creation of the following Compound record(s) encountered conflicts: File record: {'name': 'o-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} Database record: {'id': 1234, 'name': 'O-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} [name] values differ: - database: [O-phosphohomoserine] - file: [o-phosphohomoserine] During the processing of row [492] in file [tracebase-rabinowitz-data/compounds/compounds.tsv]... Creation of the following Compound record(s) encountered conflicts: File record: {'name': 'o-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} Database record: {'id': 1234, 'name': 'O-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} [name] values differ: - database: [O-phosphohomoserine] - file: [o-phosphohomoserine] During the processing of row [541] in file [tracebase-rabinowitz-data/compounds/compounds.tsv]... Creation of the following Compound record(s) encountered conflicts: File record: {'name': 'o-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} Database record: {'id': 1234, 'name': 'O-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} [name] values differ: - database: [O-phosphohomoserine] - file: [o-phosphohomoserine] During the processing of row [542] in file [tracebase-rabinowitz-data/compounds/compounds.tsv]... Creation of the following Compound record(s) encountered conflicts: File record: {'name': 'o-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} Database record: {'id': 1234, 'name': 'O-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} [name] values differ: - database: [O-phosphohomoserine] - file: [o-phosphohomoserine] During the processing of row [543] in file [tracebase-rabinowitz-data/compounds/compounds.tsv]... Creation of the following Compound record(s) encountered conflicts: File record: {'name': 'o-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} Database record: {'id': 1234, 'name': 'O-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} [name] values differ: - database: [O-phosphohomoserine] - file: [o-phosphohomoserine] During the processing of row [544] in file [tracebase-rabinowitz-data/compounds/compounds.tsv]... Creation of the following Compound record(s) encountered conflicts: File record: {'name': 'o-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} Database record: {'id': 1234, 'name': 'O-phosphohomoserine', 'formula': 'C4H10NO6P', 'hmdb_id': 'HMDB0003484'} [name] values differ: - database: [O-phosphohomoserine] - file: [o-phosphohomoserine] Traceback (most recent call last): File "/Users/jcmatese/dev/tracebase/manage.py", line 22, inNone provided
Expected behavior
None provided
Suggested Change
None provided
Comment
None
ISSUE OWNER SECTION
Assumptions
Limitations
Affected Components
Requirements
DESIGN
GUI Change description
None provided
Code Change Description
None provided
Tests