Closed ACEnglish closed 3 months ago
Hi @ACEnglish,
I got an issue runing the following cmd
trgt db create -o strains.tdb trgt_out/SJL.sorted.vcf.gz
The error message is ValueError: cannot reindex on an axis with duplicate labels
Traceback (most recent call last):
File "/home/fangzq/.conda/envs/trgt/bin/trgt", line 33, in <module>
sys.exit(load_entry_point('TRGT', 'console_scripts', 'trgt')())
File "/home/fangzq/github/trgt/trgt/__main__.py", line 44, in main
CMDS[args.cmd](args.options)
File "/home/fangzq/github/trgt/trgt/dbcmds.py", line 28, in db_main
CMDS[args.cmd][1](args.options)
File "/home/fangzq/github/trgt/trgt/database/create.py", line 74, in create_main
n_data = trgt.load_tdb(i) if i.rstrip('/').endswith(".tdb") else trgt.vcf_to_tdb(i)
File "/home/fangzq/github/trgt/trgt/database/dbutils.py", line 209, in vcf_to_tdb
allele_df = pull_alleles(data)
File "/home/fangzq/github/trgt/trgt/database/dbutils.py", line 156, in pull_alleles
alleles["LocusID"] = data["LocusID"]
Do you have any preprocessing step for importing trgt output to trgtdb ?
my trgt cmd is
./trgt-v0.8.0-linux_x86_64 --genome mm10.fa --repeats tr_catalog.adjusted.mm10.bed --reads SJL.aligned.sorted.bam --output-prefix trgt_out/SJL --threads 6
Database tool has been refactored and placed into a repository at https://github.com/ACEnglish/tdb.
@zqfang - Please try from that repository and if the error still happens, open a ticket there.
Adding code for coverting a TRGT output VCF into a database. See
tdb_tutorial.md
for usage details.TODOs:
truvari.vcf2df
. Until v4.0 is released, truvari will need to be manually installed. After truvari v4.0 is cut, we can simply uncomment the line fromtrgt/setup.py
that installs it (line 31).trgt.database.dbutils.pull_saps
assumes the allele length range is stored in the vcf asFORMAT/ALLR
. However trgt v0.3.4 writesFORMAT/ALCI
. Therefore, this code isn't compatible with trgt v0.3.4.trgt.__main__
for wrapping the trgt main executable. If we want to distribute trgt with a single command line interface (e.g.trgt run
,trgt viz
,trgt db
), we'll need to place the executables into the repository, updateMANIFEST.in
to package those executables, and then make external calls fromrun_main
(e.g.Popen(os.path.join(trgt.__file__, 'bin', 'trgt')
)