Closed coolkom closed 3 years ago
Dear Komlan, You're perfectly right. This is on my todo list. Should be done this week or the next one Best, Pierre
Hello Pierre, Thanks for replying. I have tested DiscoSnpRad and noticed a couple of bugs I can report here so that you can include modifs in the next release:
1- In the "discoRAD_clustering.sh" file, in order to run the script in case one used the released version or a built one, please replace from line 77, what is between :
Detect the directory path and #################### PARAMETERS VALUES #######################
by
Detect the directory path EDIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd ) if [ -d "$EDIR/../../build/" ] ; then # VERSION SOURCE COMPILED BINDIR=$EDIR"/../../build/bin" else # VERSION BINARY BINDIR=$EDIR"/../../bin" fi rawdiscofile_base=$( basename "${rawdiscofile}" .fa) #################### PARAMETERS VALUES #######################
2- Please comment lines 208 and 209 so that one can use ${disco_simpler}* and ${disco_filtered}.fa files later
3- In the "run_discoSnpRad.sh" script file, on line 527, replace : " kissnp2Cmd="${kissnp2_bin} -in $h5prefix.h5 -out ${kissprefix}_r -b $b $l $x -P $P -D $D $extend $option_cores_gatb $output_coverage_option -coverage_file ${h5prefix}_cov.h5 -max_ambigous_indel ${max_ambigous_indel} -max_symmetrical_crossroads ${option_max_symmetrical_crossroads} -verbose $verbose -max_truncated_path_length_difference ${max_truncated_path_length_difference}" "
by
" kissnp2Cmd="${kissnp2_bin} -in $h5prefix.h5 -out ${kissprefix}_r -b $b $l $x -P $P -D $D $extend $option_cores_gatb $output_coverage_option -coverage_file ${h5prefix}_cov.h5 -max_ambigous_indel ${max_ambigous_indel} -max_symmetrical_crossroads ${option_max_symmetrical_crossroads} -verbose $verbose" "
Since the flag "-max_truncated_path_length_difference" is not in use, this halts the run.
4- Since the -G flag is already (partially) implemented in the run_discoSnpRad.sh" script file, but not in use, I suggest you simply add the following at the end of the file, to create a second vcf file that will use the reference genome. This way, we get a "denovo" & "refmap" vcf files:
####################################################################### #################### Deal with ref mapped VCF ############################### #######################################################################
T="$(date +%s)" echo -e "$yellow ###############################################################" echo -e " #################### CREATE VCF with REF MAP #######################" echo -e " #################################################################### $reset"
if [ -n "$genome" ]; then # A Reference genome is provided vcf_ref_output="${kissprefix}_refmap.vcf" vcfCreatorCmd="$EDIR/../scripts/run_VCF_creator.sh -G $genome -p ${kissprefix}_raw_filtered.fa -o $vcf_ref_output -I $option_cores_post_analysis $e" echo $green$vcfCreatorCmd$cyan if [[ "$wraith" == "false" ]]; then $vcfCreatorCmd fi
if [ $? -ne 0 ]
then
echo "$red there was a problem with VCF creation. See how to use the \"run_VCF_creator.sh\" alone.$reset"
exit 1
fi
fi echo $reset T="$(($(date +%s)-T))" if [[ "$wraith" == "false" ]]; then echo -e "${yellow}\t###############################################################" echo -e "\t#################### DISCOSNPRAD FINISHED ######################" echo -e "\t###############################################################" Ttot="$(($(date +%s)-Ttot))" echo "DiscoSnpRad total time in seconds: ${Ttot}" echo -e "\t################################################################################################################" echo -e "\t fasta of predicted variant is \""${kissprefix}_raw.fa"\"" echo -e "\t VCF file for denovo analysis (1-based) is \""${final_output}"\"" echo -e "\t An IGV ready VCF file (sorted by position, only mapped variants, 0-based) is \""${kissprefix}_refmap_for_IGV.vcf"\"" echo -e "\t Thanks for using discoSnpRad - http://colibread.inria.fr/discoSnp/ - Forum: http://www.biostars.org/t/discoSnp/" echo -e "\t################################################################################################################${reset}" fi
Of course, I'm just suggesting, you are in a better position to decide how to implement this best ;-)
Thanks !
Best, Komlan.
Thanks for this suggestions
1- In the "discoRAD_clustering.sh" file, in order to run the script in case one used the released version or a built one, please replace from line 77, what is between :
Detect the directory path and #################### PARAMETERS VALUES #######################
by
Detect the directory path EDIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd ) if [ -d "$EDIR/../../build/" ] ; then # VERSION SOURCE COMPILED BINDIR=$EDIR"/../../build/bin" else # VERSION BINARY BINDIR=$EDIR"/../../bin" fi rawdiscofile_base=$( basename "${rawdiscofile}" .fa) #################### PARAMETERS VALUES ####################### This was missing, thanks a lot. 2- Please comment lines 208 and 209 so that one can use ${disco_simpler}* and ${disco_filtered}.fa files later
I prefer to remove disco_simpler
which is useless. I now keep ${disco_filtered}.fa
3- In the "run_discoSnpRad.sh" script file, on line 527, replace : " kissnp2Cmd="${kissnp2_bin} -in $h5prefix.h5 -out ${kissprefix}_r -b $b $l $x -P $P -D $D $extend $option_cores_gatb $output_coverage_option -coverage_file ${h5prefix}_cov.h5 -max_ambigous_indel ${max_ambigous_indel} -max_symmetrical_crossroads ${option_max_symmetrical_crossroads} -verbose $verbose -max_truncated_path_length_difference ${max_truncated_path_length_difference}" "
by
" kissnp2Cmd="${kissnp2_bin} -in $h5prefix.h5 -out ${kissprefix}_r -b $b $l $x -P $P -D $D $extend $option_cores_gatb $output_coverage_option -coverage_file ${h5prefix}_cov.h5 -max_ambigous_indel ${max_ambigous_indel} -max_symmetrical_crossroads ${option_max_symmetrical_crossroads} -verbose $verbose" "
Since the flag "-max_truncated_path_length_difference" is not in use, this halts the run.
I do not understand this point, option max_truncated_path_length_difference
is in use in kissplice and it's working.
4- Since the -G flag is already (partially) implemented in the run_discoSnpRad.sh" script file, but not in use, I suggest you simply add the following at the end of the file, to create a second vcf file that will use the reference genome. This way, we get a "denovo" & "refmap" vcf files:
####################################################################### #################### Deal with ref mapped VCF ############################### #######################################################################
T="$(date +%s)" echo -e "$yellow ###############################################################" echo -e " #################### CREATE VCF with REF MAP #######################" echo -e " #################################################################### $reset"
if [ -n "$genome" ]; then # A Reference genome is provided vcf_ref_output="${kissprefix}_refmap.vcf" vcfCreatorCmd="$EDIR/../scripts/run_VCF_creator.sh -G $genome -p ${kissprefix}_raw_filtered.fa -o $vcf_ref_output -I $option_cores_post_analysis $e" echo $green$vcfCreatorCmd$cyan if [[ "$wraith" == "false" ]]; then $vcfCreatorCmd fi
if [ $? -ne 0 ] then echo "$red there was a problem with VCF creation. See how to use the \"run_VCF_creator.sh\" alone.$reset" exit 1 fi
fi echo $reset T="$(($(date +%s)-T))" if [[ "$wraith" == "false" ]]; then echo -e "${yellow}\t###############################################################" echo -e "\t#################### DISCOSNPRAD FINISHED ######################" echo -e "\t###############################################################" Ttot="$(($(date +%s)-Ttot))" echo "DiscoSnpRad total time in seconds: ${Ttot}" echo -e "\t################################################################################################################" echo -e "\t fasta of predicted variant is ""${kissprefix}_raw.fa""" echo -e "\t VCF file for denovo analysis (1-based) is ""${final_output}""" echo -e "\t An IGV ready VCF file (sorted by position, only mapped variants, 0-based) is ""${kissprefix}_refmap_for_IGV.vcf""" echo -e "\t Thanks for using discoSnpRad - http://colibread.inria.fr/discoSnp/ - Forum: http://www.biostars.org/t/discoSnp/" echo -e "\t################################################################################################################${reset}" fi
This is a very nice suggestion. Before to publicly release it, we still have to perform more tests and validations.
3- In the "run_discoSnpRad.sh" script file, on line 527, replace : " kissnp2Cmd="${kissnp2_bin} -in $h5prefix.h5 -out ${kissprefix}_r -b $b $l $x -P $P -D $D $extend $option_cores_gatb $output_coverage_option -coverage_file ${h5prefix}_cov.h5 -max_ambigous_indel ${max_ambigous_indel} -max_symmetrical_crossroads ${option_max_symmetrical_crossroads} -verbose $verbose -max_truncated_path_length_difference ${max_truncated_path_length_difference}" " by " kissnp2Cmd="${kissnp2_bin} -in $h5prefix.h5 -out ${kissprefix}_r -b $b $l $x -P $P -D $D $extend $option_cores_gatb $output_coverage_option -coverage_file ${h5prefix}_cov.h5 -max_ambigous_indel ${max_ambigous_indel} -max_symmetrical_crossroads ${option_max_symmetrical_crossroads} -verbose $verbose" " Since the flag "-max_truncated_path_length_difference" is not in use, this halts the run.
I do not understand this point, option
max_truncated_path_length_difference
is in use in kissplice and it's working.
**I think the issue might be in the kissnp2 binary. There, the max_truncated_path_length_difference
flag seems not to work. Because my runs fail there and work only when I remove the flag.
Maybe you could have a look, I might be missing something...
Thanks !
Best, Komlan.**
Dear Pierre, Here is another suggestion:
And in the "create_IGV_compatible_VCF.sh" file, on line 32: cat $vcffile|grep -v "#"|sort -k 1,1 -k 2,2n |grep -v "^SNP"|grep -v "^INDEL">>$igvfile # from 2 2 6
In some cases (e.g. large vcf files), the sorting gives an error: "sort: write failed: /tmp/sortHNCRWX: No space left on device"
The default /tmp folder where the temporary sorted files are written might be full. Would it be possible to implement the "-T" flag for "sort" so that the temporary files are saved to the working directory instead (where the created vcf fiiles are saved) since there is more chance that this folder has more space to cope with the situation?
Thanks!
Best, Komlan.
Hi Komlan,
quick update: we are struggling with the CI machine, delaying the creation of the new release. It should be done quickly. Best, Pierre
Hi Komlan,
New release is finally out. Pierre
Hi Pierre,
Great, thanks ! Best, Komlan.
Dear Pierre,
I've noticed you made some updates in the GitHb version (especially with DiscoSnpRad) but no new release has been done (to use with conda for example, @lecorguille). Would it be possible to make one please? Thanks!
Best, Komlan.