andersen-lab / Freyja

Depth-weighted De-Mixing
BSD 2-Clause "Simplified" License
100 stars 29 forks source link

Freyja dash: KeyError and ValueError #183

Closed bsalehe closed 9 months ago

bsalehe commented 9 months ago

Hello @joshuailevy,

I am trying to run freyja 'dash' command, however I am getting two different errors. I suspect these errors are associated with the metadata which I have attached.

  1. The first error is 'ValueError', which I think is pandas date related format issue. I am getting the following error message:- File "/opt/miniconda3/envs/wastewater/bin/freyja", line 10, in sys.exit(cli()) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/freyja/_cli.py", line 438, in dash pd.to_datetime(meta_df['sample_collection_datetime']) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/pandas/core/tools/datetimes.py", line 1108, in to_datetime cache_array = _maybe_cache(arg, format, cache, convert_listlike) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/pandas/core/tools/datetimes.py", line 254, in _maybe_cache cache_dates = convert_listlike(unique_dates, format) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/pandas/core/tools/datetimes.py", line 488, in _convert_listlike_datetimes return _array_strptime_with_fallback(arg, name, utc, format, exact, errors) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/pandas/core/tools/datetimes.py", line 519, in _array_strptime_with_fallback result, timezones = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc) File "strptime.pyx", line 534, in pandas._libs.tslibs.strptime.array_strptime File "strptime.pyx", line 355, in pandas._libs.tslibs.strptime.array_strptime ValueError**: time data "23/01/2022" doesn't match format "%m/%d/%Y", at position 6. You might want to try:

    • passing format if your strings have a consistent format;
    • passing format='ISO8601' if your strings are all ISO8601 but not necessarily in exactly the same format;
    • passing format='mixed', and the format will be inferred for each element individually. You might want to use dayfirst alongside this.
  2. The second error (KeyError) I guess is related to 'Sample' attribute of the metadata file as freyja doesn't seem to understand the sample names in the 'Sample' column. I am getting the following error:

    /opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/freyja/utils.py:500: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation. df_ab_lin = pd.concat([ Traceback (most recent call last): File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3790, in get_loc return self._engine.get_loc(casted_key) File "index.pyx", line 152, in pandas._libs.index.IndexEngine.get_loc File "index.pyx", line 181, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'SRR18539103.variants.tsv'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/opt/miniconda3/envs/wastewater/bin/freyja", line 10, in sys.exit(cli()) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, *kwargs) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/freyja/_cli.py", line 467, in dash make_dashboard(agg_df, meta_df, thresh, titleText, introText, File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/freyja/utils.py", line 597, in make_dashboard df_ab_lin, df_ab_sum, dates_to_keep = get_abundance(agg_df, meta_df, File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/freyja/utils.py", line 503, in get_abundance name=meta_df.loc[sampLabel, File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/pandas/core/indexing.py", line 1146, in getitem return self.obj._get_value(key, takeable=self._takeable) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/pandas/core/frame.py", line 4015, in _get_value row = self.index.get_loc(index) File "/opt/miniconda3/envs/wastewater/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3797, in get_loc raise KeyError(key) from err

Please I wonder if you could help on this.

Thanks metadata1.csv

bsalehe commented 9 months ago

Hi @joshuailevy,

Eventually I managed to sort out the KeyError. For the valueError message I had to change the date manually to be able to run command. The KeyError was due to mismatch in the number of samples in the 'Sample' column of the metadata and demix file. For the sake of testing the command I reduced the rows in the metadata file which inconsistencies with the demixed.tsv file. Perhaps it would be good to have more informative error message if the issue is due to inconsistencies between demixed.tsv and metadata.csv files.

joshuailevy commented 9 months ago

Hi @bsalehe,

Glad you were able to figure it out. I'll definitely make sure to do so- the dash command still needs a bit of work. If you're willing to share your original demixed.tsv file that led to the issues, that'd be a huge help!

Thanks a ton in advance, Josh