WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
249 stars 52 forks source link

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. #229

Closed alyzzabc closed 1 year ago

alyzzabc commented 1 year ago

Hi,

I was running DRAM-v.py annotate and it looked like it finished in 6 hours. I do have the annotations.tsv file. This is the Slurm output:

2022-10-20 20:20:57.811197: Viral annotation started 0:00:00.670406: Retrieved database locations and descriptions 0:00:00.670453: Annotating final-viral-combined-for-dramv 0:04:07.746455: Turning genes from prodigal to mmseqs2 db 0:04:11.094337: Getting hits from kofam 2:00:55.141748: Getting forward best hits from viral 2:04:13.874541: Getting reverse best hits from viral 2:04:29.917748: Getting descriptions of hits from viral 2:04:31.191055: Getting forward best hits from peptidase 2:07:17.292079: Getting reverse best hits from peptidase 2:07:18.794918: Getting descriptions of hits from peptidase 2:07:19.916027: Getting hits from pfam 2:08:38.381901: Getting hits from dbCAN 2:11:26.495058: Getting hits from VOGDB 4:07:17.594519: Merging ORF annotations 6:19:39.563058: Annotations complete, processing annotations 6:19:41.116050: Annotations complete, assigning auxiliary scores and flags 6:24:37.870109: Completed annotations

But the error log file was not empty, and raised a warning:

/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM-py3.10/lib/python3.10/site-packages/mag_annotator/annotate_vgfs.py:184: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy virsorter_genes['start_position'] = virsorter_genes['start_position'].astype(int) /gxfs_home/geomar/smomw535/miniconda3/envs/DRAM-py3.10/lib/python3.10/site->packages/mag_annotator/annotate_vgfs.py:185: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas->docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy virsorter_genes['end_position'] = virsorter_genes['end_position'].astype(int)

I was wondering if this warning would affect the results I got.

rmFlynn commented 1 year ago

It will not, it is a hold-over from ye-old pandas, but it does not affect the results.

sahilrishav2 commented 4 months ago

Hello @rmFlynn , I want to say that I had also the same warning after which i ran this command DRAM-v.py distill -i dramv-annotate/annotations.tsv -o dramv-distill after which i got an error:

2024-05-11 10:33:16,308 - The log file is created at dramv-distill/distill.log
2024-05-11 10:33:16,329 - Retrieved database locations and descriptions
2024-05-11 10:33:16,331 - Note: the fallowing id fields were not in the annotations file and are not being used: ['kegg_genes_id', 'kegg_id', 'cazy_best_hit', 'camper_id', 'fegenie_id', 'sulfur_id', 'methyl_id'], but these are ['ko_id', 'kegg_hit', 'peptidase_family', 'pfam_hits']
2024-05-11 10:33:16,331 - Determined potential amgs
/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/summarize_vgfs.py:64: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  strand = strandedness[0]
/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/summarize_vgfs.py:66: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  if strandedness[i] != strand:
/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/summarize_vgfs.py:68: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  strand = strandedness[i]
/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/summarize_vgfs.py:64: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  strand = strandedness[0]
/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/summarize_vgfs.py:66: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  if strandedness[i] != strand:
2024-05-11 10:33:16,373 - Calculated viral genome statistics
2024-05-11 10:33:16,389 - No distillate information found for 0 genes.
2024-05-11 10:33:16,392 - Generated AMG summary
Traceback (most recent call last):
  File "/home/rishav/anaconda3/envs/DRAM/bin/DRAM-v.py", line 161, in <module>
    args.func(**args_dict)
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/summarize_vgfs.py", line 284, in summarize_vgfs
    product.save(path.join(output_dir, 'product.html'))
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/vegalite/v4/api.py", line 488, in save
    result = save(**kwds)
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/utils/save.py", line 83, in save
    spec = chart.to_dict()
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/vegalite/v4/api.py", line 384, in to_dict
    dct = super(TopLevelMixin, copy).to_dict(*args, **kwargs)
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/utils/schemapi.py", line 326, in to_dict
    result = _todict(
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/utils/schemapi.py", line 60, in _todict
    return {
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/utils/schemapi.py", line 61, in <dictcomp>
    k: _todict(v, validate, context)
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/utils/schemapi.py", line 58, in _todict
    return [_todict(v, validate, context) for v in obj]
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/utils/schemapi.py", line 58, in <listcomp>
    return [_todict(v, validate, context) for v in obj]
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/utils/schemapi.py", line 56, in _todict
    return obj.to_dict(validate=validate, context=context)
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/vegalite/v4/api.py", line 2020, in to_dict
    return super().to_dict(*args, **kwargs)
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/vegalite/v4/api.py", line 374, in to_dict
    copy.data = _prepare_data(original_data, context)
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/vegalite/v4/api.py", line 89, in _prepare_data
    data = _pipe(data, data_transformers.get())
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/toolz/functoolz.py", line 628, in pipe
    data = func(data)
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/toolz/functoolz.py", line 304, in __call__
    return self._partial(*args, **kwargs)
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/vegalite/data.py", line 19, in default_data_transformer
    return curried.pipe(data, limit_rows(max_rows=max_rows), to_values)
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/toolz/functoolz.py", line 628, in pipe
    data = func(data)
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/toolz/functoolz.py", line 304, in __call__
    return self._partial(*args, **kwargs)
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/utils/data.py", line 149, in to_values
    data = sanitize_dataframe(data)
  File "/home/rishav/anaconda3/envs/DRAM/lib/python3.10/site-packages/altair/utils/core.py", line 317, in sanitize_dataframe
    for col_name, dtype in df.dtypes.iteritems():
  File "/home/rishav/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 6296, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'iteritems'

Please help me with this concern.