HurlesGroupSanger / indelible

Structural Variation breakpoint discovery via adaptive learning
GNU General Public License v3.0
15 stars 1 forks source link

Issue: build_database with prior #8

Closed June3billion closed 2 years ago

June3billion commented 2 years ago

Dear indelible guys.

I encountered the following error

`"ileNotFoundError: [Errno 2] No such file or directory: '/Users/eg15/PycharmProjects/Indelible/test/test_db.txt'"`

with the following command

/NAS/data/personal/june_dev/Tools/Python/anaconda3/bin/python indelible.py complete --keeptmp \
 --config /NAS/data/etc/indelible/example_config.hg19.yml \
 --priors /NAS/data/etc/indelible/data/Indelible_db_10k.bed \
 --i /NAS/data/etc/indelible/test_data/DDD_MAIN5194229_Xchrom_subset_sorted.bam \
 --o /NAS/data/etc/indelible/test_data/output \
 --r /NAS/data/etc/indelible/data/hs37d5.fa

How can I use /NAS/data/etc/indelible/data/Indelible_db_10k.bed for MAF annotation?

eugenegardner commented 2 years ago

Hello @June3billion – I have pushed a fix for this issue. I had erroneously left in a hard-corded path during testing. Please pull the latest version, test, and let me know if this resolves the issue.

June3billion commented 2 years ago

Thank you for the kind response! But I just encountered another issue with the latest version...

sys:1: DtypeWarning: Columns (0) have mixed types.Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
  File "indelible.py", line 210, in <module>
    indelible.build_database(score_list_path, db_path, args.reference_path, config, args.priors, args.bwa_thread)
  File "/NAS/data/etc/indelible/indelible/build_database.py", line 233, in build_database
    priors_frame = build_priors(priors, final_frame)
  File "/NAS/data/etc/indelible/indelible/build_database.py", line 43, in build_priors
    priors_frame = priors_frame.drop(index=already_found)
  File "/NAS/data/personal/june_dev/Tools/Python/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 4308, in drop
    return super().drop(
  File "/NAS/data/personal/june_dev/Tools/Python/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 4153, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/NAS/data/personal/june_dev/Tools/Python/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 4207, in _drop_axis
    raise KeyError(f"{labels} not found in axis")
KeyError: "['X:153296083' 'X:153296122'] not found in axis"
eugenegardner commented 2 years ago

I think this issue has to do with a difference in how index columns are handled in newer versions of the pandas module. I have slightly modified the function that reads priors to hopefully address this. Could you once again pull the latest version and test?

June3billion commented 2 years ago

Thank you! The latest version worked well. but, I found the input MAF data (Indelible_db_10k.bed) was not reflected in the final output. The MAF column of the output file from the example data has 1. I assume that frequencies came from the input data itself... can I restrict the MAF information to the input database?

eugenegardner commented 2 years ago

Yes, I can see why one could want the MAF information from the priors. I see two use-cases here:

  1. The original design – To simply filter based on presence/absence in samples from the users own research cohort
  2. Potentially new use – To use it to determine MAF in a sample population (kind of like using gnomAD/UKBiobank?)

Let me see about adding an option flag to retain the MAF of the prior database.

June3billion commented 2 years ago

Thank you so much! I really appreciate it.

eugenegardner commented 2 years ago

I have pushed a new build which includes a flag (--old-maf) for the database / complete commands. Could you let me know if it works for you?

Thanks!

June3billion commented 2 years ago

It worked! Thank you!

eugenegardner commented 2 years ago

Good to hear. I am going to close this issue. Please reopen if you encounter any other problems.