Open FFroehlich opened 3 years ago
Linting complains about invalid condition IDs. I believe the error is due to the fact that the condition id is numeric only.
Correct.
Change the spec to be more accurate about identifier restrictions or change linting to be more permissive about identifiers.
https://petab.readthedocs.io/en/latest/documentation_data_format.html#overview says:
All identifiers must consist only of upper and lower case letters, digits and underscores, and must not start with a digit.
But it seems to be overlooked to easily in the preamble. So maybe better to include that sentence with for every ID column.
Not sure how much sense it makes to print line numbers when operating with a DataFrame.
Dataframe row index preferred? Not really sure which one is more helpful...
Typehints in
check_condition_df
are inconsistent with the function signature, i.e., sbml_model is not and optional argument.
:+1:
Not sure how much sense it makes to print line numbers when operating with a DataFrame.
Dataframe row index preferred? Not really sure which one is more helpful...
I think printing the regex it needs to match would be more helpful than any line numbers. The row index would be equal to the offending ID here so that doesn't really add anything new. I don't think any further info is necessary and line numbers are misleading for non-file inputs.
Line numbers were added in #467 after a couple of people had issues with whitespace-only lines being added to the end of their PEtab TSV files (possibly after exporting from Excel.. not sure), which then caused errors (e.g. #466).
If line numbers are removed, then a check for whitespace-only lines in TSV files/rows in dataframes might be useful, as the ID in the error message may appear blank.
I guess the line numbers aren't always correct for files either e.g. when using multiple parameter subset files that are combined into a single dataframe.
Line numbers were added in #467 after a couple of people had issues with whitespace-only lines being added to the end of their PEtab TSV files (possibly after exporting from Excel.. not sure), which then caused errors (e.g. #466).
If line numbers are removed, then a check for whitespace-only lines in TSV files/rows in dataframes might be useful, as the ID in the error message may appear blank.
I guess the line numbers aren't always correct for files either e.g. when using multiple parameter subset files that are combined into a single dataframe.
I think when importing from files, it makes sense to print line-numbers, when importing from DataFrames, not so much. Should be possible to parse the input type somewhere.
I guess the line numbers aren't always correct for files either e.g. when using multiple parameter subset files that are combined into a single dataframe.
Good point.
So how shall we handle that? Drop line numbers? Just print the offending row? It's not too helpful for whitespace-only rows, but they should be relatively easy to spot anyways.
So how shall we handle that? Drop line numbers? Just print the offending row? It's not too helpful for whitespace-only rows, but they should be relatively easy to spot anyways.
Yes, but if pd.isna()
then add a hint in the error message to suggest that the user checks for whitespace-only lines, instead of printing the row, should be sufficient.
Which problem would you like to address? Please describe. Linting complains about invalid condition IDs. I believe the error is due to the fact that the condition id is numeric only. Ideally the error would be a bit more informative what the actual issue is with the condition ID.
Describe the solution you would like Change the spec to be more accurate about identifier restrictions or change linting to be more permissive about identifiers.
Describe alternatives you have considered Don't use numbers in identifiers.
Additional context Can be checked via
check_condition_df(pd.DataFrame({petab.CONDITION_ID: ['0']}).set_index(petab.CONDITION_ID), None)
.Not sure how much sense it makes to print line numbers when operating with a DataFrame.
Typehints in
check_condition_df
are inconsistent with the function signature, i.e., sbml_model is not and optional argument.