Closed amsession closed 4 months ago
Hi. This looks very similar to the issue in #1402 in that it appears vg deconstruct
is writing a line with no sample information
[E::bcf_write] Broken VCF record, the number of columns at Chr1L:30057051 does not match the number of samples (0 vs 1)
Are you able to share the input data with me so I can try to reproduce? Failing that, if you could share the contents of /XlaXpe.Chr1L.txt
that may help a bit. Thanks
Unfortunately both fasta files are too large to share here even after compression (25MB limit). This is attempting to align sequences Chr1L sequences between Xenopus laevis v10 genome here: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_017654675.1/ , and Xenopus petersii paternal assembly here: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_038501925.1/ . "Chr1L" in X. laevis, "1L" in petersii. There are massive misassemblies in the maternal assembly so that should not be used. If there is an easier way to share the fastas I have directly please let me know. The .txt file is attached.
Thanks!! I was able to reproduce it. Will fix asap. For the record, these are the commands I used (using v2.8.3)
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/017/654/675/GCF_017654675.1_Xenopus_laevis_v10.1/GCF_017654675.1_Xenopus_laevis_v10.1_genomic.fna.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/038/501/925/GCA_038501925.1_aXenPet1.paternal.cur/GCA_038501925.1_aXenPet1.paternal.cur_genomic.fna.gz
gzip -d GCF_017654675.1_Xenopus_laevis_v10.1_genomic.fna.gz
gzip -d GCA_038501925.1_aXenPet1.paternal.cur_genomic.fna.gz
mkdir -p ./XlaChr
mkdir -p ./XpeChr
samtools faidx GCF_017654675.1_Xenopus_laevis_v10.1_genomic.fna NC_054371.1 > ./XlaChr/Chr1L.fa
samtools faidx GCA_038501925.1_aXenPet1.paternal.cur_genomic.fna CM076672.1 > ./XpeChr/1L.fa
printf "Xla ./XlaChr/Chr1L.fa\n" > XlaXpe.Chr1L.txt
printf "Xpe ./XpeChr/1L.fa\n" >> XlaXpe.Chr1L.txt
TOIL_SLURM_ARGS="--partition=long --time=8000" cactus-pangenome ./js ./XlaXpe.Chr1L.txt --outDir Chr1L --outName Chr1L --reference Xla --vcf --giraffe --gfa --gbz --consCores 32 --batchSystem slurm --logFile Chr1L.log --indexCores 32 --mgCores 32
I am trying to run the cactus-pangenome algorithm and was able to successfully run on the example data set, however when trying to use a single chromosome of real data with just 2 species the algorithm seems to fail at the "make_vcf" step. I am unsure of how to interpret the log file beyond that.
The log file is attached, and the exact command used was "apptainer exec ~/LOCAL.INSTALL/cactus/cactus_v2.8.3.sif cactus-pangenome ./js ./XlaXpe.Chr1L.txt --outDir Chr1L --outName Chr1L --reference Xla --vcf --giraffe --gfa --gbz --maxCores 32 --restart" . This was the latest log file after trying to restart with more maxCores.
error_log5.txt