Open ARW-UBT opened 6 months ago
Can you provide the complete output? If you're running this on a SLURM hpc, then it would the "out" file. If you are running on a workstation in a bash terminal, then it would be what is output to the screen.
This will help me understand the settings you are using and also allow me to identify the offending line of code.
Hello here are some outputs that I have in this project. log-fltrBAM_out.txt log-fltrBAM_out-error.txt log-mkBAM_out.txt
Thanks for your help, Best,
These are the types of files that I was looking for.
the log-mkBAM_out.txt
and log-fltrBAM_out.txt
files don't indicate any fatal errors, but the log-fltrBAM_out-error.txt
file does indicate that 4 filtered bam files have a formatting error. Perhaps those 4 have no reads left after filtering?
You should have at least another out file from running mkVCF that would help me with your original issue.
Further, dDocentHPC is compatible with the ddocent 2.7.8 dependencies and I do know there are unresolved issues with the 2.9.4 dependencies in dDocentHPC. That is probably why you're getting the mkVCF error. You can load the 2.7.8 dependencies here
conda install -c bioconda ddocent=2.7.8
You should have at least another out file from running mkVCF that would help me with your original issue.
Here are some output files from a test run, this has also the mkVCF output (the out files from above are from a run that did not reach mkVCF step). fltrBAM.out.txt mkBAM.out.txt mkVCF.out.txt
conda install -c bioconda ddocent=2.7.8
Thanks for this info, I will try v.2.7.8 now. (Sorry, I have overlooked this recommendation on Github)
Hi, I have installed ddocent 2.7.8 and started an new run. trimFQ completed successfully, creating mkREF and mkBAM folders mkREF does some calculations, but did not complete successfully. Here is the out file and a copy of additional screen output. out-mkREF.txt out-mkREF-screen.txt
Related to this step: I assume it should be possible to provide an existing reference genome file to dDocentHPC, such as 'reference.fasta' in the classical dDocent script. I have provided a 'reference.fasta' file (also in the dDocentHPC format as 'reference.2.2.fasta') in the mkBAM folder and started the mkBAM analysis. mkBAM completes without error messages, but the -bam files are empty. out-mkBAM.txt
How do I have to prepare an existing reference file for the mkBAM step.
Installing ddocent 2.7.8 and bwa-meme seemed to solve the newest mkBAM issue. Will leave this open a little longer to confirm that mkVCF is working now
Tried it till mkVCF and got this screen message. out-mkVCF.txt out-mkVCF-screen.txt
Getting back to the original error, which was not solved by the dDocent conda install package version,
vcfcombine
is a command in the vcflib
package that quite literally combines together the sub vcf files that were created by each parallelized thread. It's called in line 2247 of dDocentHPC.bash
vcfcombine ./raw.$CUTOFFS.vcf/raw.*.$CUTOFFS.vcf | sed -e 's/ \.\:/ \.\/\.\:/g' > TotalRawSNPs.$CUTOFFS.vcf
My working version is from vcflib v1.0.0-rc0-349-g45c6-dirty
. But before you install older versions of vcflib
, I don't see how this can be the problem.
I hypothesize that the error is being triggered because there are no vcf files to combine, or they are all 0 bytes. To confirm, I would check to see if the sub vcf files were created by the freebayes
command and that their size is >0 bytes. They would be in your dir named raw.2.2.vcf
.
If you do, indeed, have vcf files with data in raw.2.2.vcf
, then I would troubleshoot by running the following from the parent dir of raw.2.2.vcf
:
CUTOFFS=2.2 #because these are the cutoffs you used in naming your ref
vcfcombine ./raw.$CUTOFFS.vcf/raw.*.$CUTOFFS.vcf | sed -e 's/ \.\:/ \.\/\.\:/g' > TotalRawSNPs.$CUTOFFS.vcf
There are times where the sub vcf files are created in the dDocentHPC script, but they aren't combined into a single final file, but then the same line will run just fine on it's own.
if you don't have any vcf files or they are all empty, then the vcfcombine
error is a symptom rather than the cause. Make sure that freebayes
has the files it needs to run. Your settings cause the following line to run freebayes
on dDocent
line 2214:
ls mapped.*.$CUTOFFS.bed | sed 's/mapped.//g' | sed 's/.bed//g' | shuf | $PARALLEL "freebayes -b cat.$CUTOFFS-RRG.bam -t mapped.{}.bed -v raw.{}.vcf -f reference.$CUTOFFS.fasta -p $PLOIDY -n $BEST_N_ALLELES -m $MIN_MAPPING_QUAL -q $MIN_BASE_QUAL -E $HAPLOTYPE_LENGTH --min-repeat-entropy $MIN_REPEAT_ENTROPY --min-coverage $MIN_COVERAGE -F $MIN_ALT_FRACTION -C $FREEBAYES_C -G $FREEBAYES_G -3 $FREEBAYES_3 -e $FREEBAYES_e -z $FREEBAYES_z -Q $FREEBAYES_Q -U $FREEBAYES_U -$ $FREEBAYES_DOLLAR --populations popmap.$CUTOFFS ${FREEBAYES_r}${FREEBAYES_report_monomorphic}${FREEBAYES_w}${FREEBAYES_V}${FREEBAYES_a}${FREEBAYES_no_partial_observations}"
Check that you have a cat.2.2-RRG.bam
file with >0 bytes.
Check that you have mapped.*.2.2.bed
files with >0 bytes.
If you are missing either then let me know.
If you have the files necessary for freebayes
to run, then I propose the most likely cause of no vcf or empty vcf files would be the freebayes
settings. You've thinned your data for quick testing, and you consequently may need to adjust the settings to be more lenient. I'd look at these in particular:
# in the config file under mkVCF:
2 Minimum Mean Depth of Coverage Per Individual Limit analysis to contigs with at least the specified mean depth of coverage per individual
0.375 freebayes -F --min-alternate-fraction (decimal 0-1) There must be at least 1 individual with this fraction of alt reads to evaluate the position. If your individuals are barcoded, then use 0.2. If your data is pooled, then set based upon ~ 1/(numIndivids * ploidy) and average depth of coverage
20 freebayes -C --min-alternate-count (integer) Require at least this count of observations supporting an alternate allele within a single individual in order to evaluate the position. default: 2
20 freebayes -G --min-alternate-total (integer) Require at least this count of observations supporting an alternate allele within the total population in order to use the allele in analysis. default: 1
# this one might take too long to run if you change it
no freebayes --report-monomorphic (no|yes) Report even loci which appear to be monomorphic, and report allconsidered alleles, even those which are not in called genotypes. Loci which do not have any potential alternates have '.' for ALT.
Let me know how this goes
And for what it's worth, here are the packages in the container I use to run dDocentHPC on an hpc:
cbird@e3-w6420b-20:~$ crun conda list
# packages in environment at /opt/ddocent:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 0_gnu conda-forge
atk 2.36.0 0 conda-forge
atk-1.0 2.36.0 haf93ef1_0 conda-forge
bedtools 2.29.2 hc088bd4_0 bioconda
bwa 0.7.17 hed695b0_7 bioconda
bzip2 1.0.8 h516909a_2 conda-forge
ca-certificates 2020.4.5.1 hecc5488_0 conda-forge
cairo 1.16.0 hcf35c78_1003 conda-forge
cd-hit 4.8.1 hdbcaa40_0 bioconda
certifi 2020.4.5.1 py37hc8dfbb8_0 conda-forge
chrpath 0.16 h14c3975_1001 conda-forge
coreutils 8.31 h516909a_0 conda-forge
curl 7.68.0 h33f0ec9_1 conda-forge
dbus 1.13.6 he372182_0 conda-forge
ddocent 2.7.8 3 bioconda
expat 2.2.9 he1b5a44_2 conda-forge
fastp 0.20.0 hdbcaa40_0 bioconda
fontconfig 2.13.1 h86ecdb6_1001 conda-forge
freebayes 1.3.2 py37hc088bd4_0 bioconda
freetype 2.10.1 he06d7ca_0 conda-forge
fribidi 1.0.9 h516909a_0 conda-forge
gdk-pixbuf 2.38.2 h3f25603_3 conda-forge
gettext 0.19.8.1 hc5be6a0_1002 conda-forge
giflib 5.2.1 h516909a_2 conda-forge
glib 2.58.3 py37he00f558_1003 conda-forge
gnuplot 5.2.7 h0fb2448_3 conda-forge
gobject-introspection 1.58.2 py37h619baee_1004 conda-forge
graphite2 1.3.13 he1b5a44_1001 conda-forge
grep 2.14 h14c3975_3 bioconda
gst-plugins-base 1.14.5 h0935bb2_2 conda-forge
gstreamer 1.14.5 h36ae1b5_2 conda-forge
gtk2 2.24.32 h586f36d_1 conda-forge
harfbuzz 2.4.0 h9f30f68_3 conda-forge
htslib 1.9 h4da6232_3 bioconda
icu 64.2 he1b5a44_1 conda-forge
jpeg 9c h14c3975_1001 conda-forge
krb5 1.17.1 h2fd8d38_0 conda-forge
ld_impl_linux-64 2.34 h53a641e_0 conda-forge
libclang 9.0.1 default_hde54327_0 conda-forge
libcurl 7.68.0 hf7181ac_1 conda-forge
libdeflate 1.5 h516909a_0 conda-forge
libedit 3.1.20170329 hf8c457e_1001 conda-forge
libffi 3.2.1 he1b5a44_1007 conda-forge
libgcc-ng 9.2.0 h24d8f2e_2 conda-forge
libgd 2.2.5 h307a58e_1007 conda-forge
libgomp 9.2.0 h24d8f2e_2 conda-forge
libiconv 1.15 h516909a_1006 conda-forge
libllvm9 9.0.1 hc9558a2_0 conda-forge
libpng 1.6.37 hed695b0_1 conda-forge
libssh2 1.8.2 h22169c7_2 conda-forge
libstdcxx-ng 9.2.0 hdf63c60_2 conda-forge
libtiff 4.1.0 hc3755c2_3 conda-forge
libuuid 2.32.1 h14c3975_1000 conda-forge
libwebp 1.0.2 h56121f0_5 conda-forge
libxcb 1.13 h14c3975_1002 conda-forge
libxkbcommon 0.10.0 he1b5a44_0 conda-forge
libxml2 2.9.10 hee79883_0 conda-forge
lz4-c 1.8.3 he1b5a44_1001 conda-forge
mawk 1.3.4 h14c3975_2 bioconda
ncurses 6.1 hf484d3e_1002 conda-forge
nspr 4.25 he1b5a44_0 conda-forge
nss 3.47 he751ad9_0 conda-forge
openssl 1.1.1f h516909a_0 conda-forge
pango 1.42.4 h7062337_4 conda-forge
parallel 20200322 0 conda-forge
pcre 8.44 he1b5a44_0 conda-forge
pear 0.9.6 h98de208_5 bioconda
perl 5.26.2 h516909a_1006 conda-forge
pip 20.0.2 py_2 conda-forge
pixman 0.38.0 h516909a_1003 conda-forge
pthread-stubs 0.4 h14c3975_1001 conda-forge
python 3.7.6 h8356626_5_cpython conda-forge
python_abi 3.7 1_cp37m conda-forge
qt 5.12.5 hd8c4c69_1 conda-forge
rainbow 2.0.4 h14c3975_3 bioconda
readline 8.0 hf8c457e_0 conda-forge
samtools 1.9 h10a08f8_12 bioconda
sed 4.7 h1bed415_1000 conda-forge
seqtk 1.3 hed695b0_2 bioconda
setuptools 46.1.3 py37hc8dfbb8_0 conda-forge
sqlite 3.30.1 hcee41ef_0 conda-forge
tk 8.6.10 hed695b0_0 conda-forge
unzip 6.0 h516909a_0 conda-forge
vcflib 1.0.0_rc3 py37hc088bd4_0 bioconda
vcftools 0.1.16 he860b03_3 bioconda
wheel 0.34.2 py_1 conda-forge
xorg-kbproto 1.0.7 h14c3975_1002 conda-forge
xorg-libice 1.0.10 h516909a_0 conda-forge
xorg-libsm 1.2.3 h84519dc_1000 conda-forge
xorg-libx11 1.6.9 h516909a_0 conda-forge
xorg-libxau 1.0.9 h14c3975_0 conda-forge
xorg-libxdmcp 1.1.3 h516909a_0 conda-forge
xorg-libxext 1.3.4 h516909a_0 conda-forge
xorg-libxrender 0.9.10 h516909a_1002 conda-forge
xorg-libxt 1.2.0 h516909a_0 conda-forge
xorg-renderproto 0.11.1 h14c3975_1002 conda-forge
xorg-xextproto 7.3.0 h14c3975_1002 conda-forge
xorg-xproto 7.0.31 h14c3975_1007 conda-forge
xz 5.2.5 h516909a_0 conda-forge
zlib 1.2.11 h516909a_1006 conda-forge
zstd 1.4.4 h3b9ef0a_2 conda-forge
Check that you have a
cat.2.2-RRG.bam
file with >0 bytes.Check that you have
mapped.*.2.2.bed
files with >0 bytes.
I do have the cat-file: 241M 9. Jan 16:13 cat.2.2-RRG.bam
I do have a mapped bed file: 362K 9. Jan 16:13 mapped.2.2.bed but I am not sure whether ther should me more of them according to your wildcard.
The raw.2.2.vcf folder is indeed empty.
There's a mawk script starting on line 2121 of dDocentHPC.bash that splits the bed file into N files based upon the number of threads available. If you don't have a set of bed files then I think that's where the issue is. In the present dDocentHPC.bash script, the line of code that deletes those bed files is commented out, so they should remain. You can try copying this mawk script on 2121 into a text editor, remove the tabs, then paste into your terminal and run
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: ARW-UBT @.> Sent: Wednesday, January 10, 2024 3:06:42 AM To: cbirdlab/dDocentHPC @.> Cc: Bird, Chris @.>; Comment @.> Subject: Re: [cbirdlab/dDocentHPC] mkVCF vcfcombine error (Issue #18)
Check that you have a cat.2.2-RRG.bam file with >0 bytes.
Check that you have mapped.*.2.2.bed files with >0 bytes.
I do have the cat-file: 241M 9. Jan 16:13 cat.2.2-RRG.bam
I do have a mapped bed file: 362K 9. Jan 16:13 mapped.2.2.bed but I am not sure whether ther should me more of them according to your wildcard.
The raw.2.2.vcf folder is indeed empty.
— Reply to this email directly, view it on GitHubhttps://github.com/cbirdlab/dDocentHPC/issues/18#issuecomment-1884364628, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADBV4S5Y2LCHQG6NEE35TETYNZDZFAVCNFSM6AAAAABA7TMZZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBUGM3DINRSHA. You are receiving this because you commented.Message ID: @.***>
Additionally, to run the mawk script, you need to first set the variables
CUTOFFS=2.2 NUMProc=4 #this will cause 4 bed files to be made
and run the if then statement on 2111 (remove the tabs)
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: Bird, Chris @.> Sent: Wednesday, January 10, 2024 5:20:37 AM To: cbirdlab/dDocentHPC @.>; cbirdlab/dDocentHPC @.> Cc: Comment @.> Subject: Re: [cbirdlab/dDocentHPC] mkVCF vcfcombine error (Issue #18)
There's a mawk script starting on line 2121 of dDocentHPC.bash that splits the bed file into N files based upon the number of threads available. If you don't have a set of bed files then I think that's where the issue is. In the present dDocentHPC.bash script, the line of code that deletes those bed files is commented out, so they should remain. You can try copying this mawk script on 2121 into a text editor, remove the tabs, then paste into your terminal and run
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: ARW-UBT @.> Sent: Wednesday, January 10, 2024 3:06:42 AM To: cbirdlab/dDocentHPC @.> Cc: Bird, Chris @.>; Comment @.> Subject: Re: [cbirdlab/dDocentHPC] mkVCF vcfcombine error (Issue #18)
Check that you have a cat.2.2-RRG.bam file with >0 bytes.
Check that you have mapped.*.2.2.bed files with >0 bytes.
I do have the cat-file: 241M 9. Jan 16:13 cat.2.2-RRG.bam
I do have a mapped bed file: 362K 9. Jan 16:13 mapped.2.2.bed but I am not sure whether ther should me more of them according to your wildcard.
The raw.2.2.vcf folder is indeed empty.
— Reply to this email directly, view it on GitHubhttps://github.com/cbirdlab/dDocentHPC/issues/18#issuecomment-1884364628, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADBV4S5Y2LCHQG6NEE35TETYNZDZFAVCNFSM6AAAAABA7TMZZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBUGM3DINRSHA. You are receiving this because you commented.Message ID: @.***>
In order to check if the thinned out data in my test run is causing some of the issues, I decided to use my old 150 bp paired end date from the previous dDocent analyses again (which were sucessfully analyzed by dDocent (and Stacks)) and to feed them into dDocentHPC: I started with the trimmed R1 and R2 reads:
The mkVCF step started and was qiute busy for > 20 minutes, but did stop with the error below.
vcfcombine: symbol lookup error: /home/bt140047/miniconda3/envs/ddocent-2.7.8/bin/../lib/libvcflib.so.1: undefined symbol: wavefront_align
Now, however, without the messages related to the missing vcf or bed files (they are there).
It seems to me that the vcfcombine step is causing the error message on the screen, as I can reproduce it by manually running the 'solution a' instruction mentioned above:
CUTOFFS=2.2
vcfcombine ./raw.$CUTOFFS.vcf/raw.*.$CUTOFFS.vcf | sed -e 's/ \.\:/ \.\/\.\:/g' > TotalRawSNPs.$CUTOFFS.vcf
Oops! I just realized that the 33 vcf files in the raw folder were created, but are emty... Probably due to filtering settings... Sorry for not havin seen this before...
That's progress that you have the vcf files, troubleshooting combining them together should be straight forward.
I had a conversation with chatgpt and it's primary conclusion "that there's a mismatch between the vcfcombine executable and the version of the library (libvcflib.so.1) it is trying to use" seems reasonable. I suspect that vcfcombine
is that installed by 2.7.8 (try vcfcombine -v
in the two different environments) and the libvcflib.so.1
being accessed might be the newer version from vcflib (1.0.9 is the newest version on github)
As a quick test, you might try changing to the newer ddocent environment and try running the vcfcombine step again. If you look over the chatgpt transcript, it also suggests some solutions to resolve the issue in the older ddocent environment. I additionally tried to have chatgpt narrow down where the libvcflib.so.1 files are located in your dir.
The vcf files in the raw.2.2.vcf directory are empty (0 Bytes), I assume the vcfcombine will not work. Since the bed files are not empty, would it make sense to adjust config settings for mkVCF and run mkVCF in order to get anything written into the vcf files? If so, which parameters should be changed in order to avoid filtering out too much. Just to see, whether this works, and I could then iteratively try to adjust the mkVCF config settings.
After that I would rerun vcfcombine from the newest dDocent environment.
Does this make sense?
Hello, The dDocentHPC analyses completed almost to the end, but I get at the very end the following error message: vcfcombine: symbol lookup error: /home/bt140047/mambaforge/envs/ddocent/bin/../lib/libvcflib.so.1: undefined symbol: wavefront_align
The ddocent conda env contains that vcflib package:
Name Version Build Channel
vcflib 1.0.9 h146fbdb_4 bioconda
And it seems that freebayes needs this version: $ mamba repoquery whoneeds vcflib Executing the query vcflib Name Version Build Depends Channel ddocent 2.9.4 hdfd78af_1 vcflib >=0.1.11 bioconda freebayes 1.3.7 h1870644_0 vcflib >=1.0.9,<2.0a0 bioconda
Does anybody have a suggestion how to proceed? Best regards,