Open agolicz opened 8 months ago
Thank you very much for your support. . It seems that the long-read data you provided (lr.fq.gz) did not exact. You supplied long-read data, but CRAQ appears to have only used short-read data;
Is your command was craq -g assembly.fa -sms lr.fq.gz -ngs SRR10382366_1.fastq, SRR10382366_2.fastq -x map-hifi ?
Please check if your 'lr.fq.gz' file exists. Additionally, If you are using symbolic links of "lr.fq.gz" , please ensure that the filename of lr.fq.gz is the same to your raw SMS.fq file.
Additionally, may I ask if your lr.fq.gz file actually exists? If you only have short-reads data, you can use the following command: 'craq -g assembly.fa -ngs SRR10382366_1.fastq, SRR10382366_2.fastq'.
Indeed my error! I might have just worked out our other issue too. I will report back. Agnieszka
Alright, Agnieszka. Additionally, I need to point out that if you want to verify errors in the genome, the two files "locER_out/out_final.CRE.bed " and "strER_out/out_final.CSE.bed" are more valuable, as they mainly impact the final AQI value and the out_final.Report results. The file "low_confidence.bed" only reports regions in the genome with relatively low reads coverage, and it might be empty. To avoid potential confusion for users , I am considering moving this file to another folder."
If you have trouble, let me know.
Ok, thanks! Would it be possible to maybe also update the documentation with '"low_confidence.bed" only reports regions in the genome with relatively low reads coverage'? Current wording is a bit ambiguous and I think my students understood it wrong.
I think one other thing we might have run into is that some of our genomes have large chromosomes and need a csi index, but I don't think this is handled.
Agnieszka
Hi, Agnieszka, Thanks, the documentation was update to avoid potential confusion.
If you encounter the following issue : “[E::hts_idx_check_range] Region 631441106..631444822 cannot be stored in a bai index. Try using a csi index [E::sam_index] Read 'sms2' with ref_name='ref1', ref_length=998496443, flags=256, pos=631441107 cannot be indexed samtools index: failed to create index for "LRout/tmp_bam/sms.fa_sort.bam": Numerical result out of range”
Don't worry, CRAQ will continue to run. In fact, I only use the .bai suffix to ensure that the BAM file is sorted.
Hi, Thanks very much for the great tool. We've been trying CRAQ out in several of our assembly projects and we sometimes run into strange behavior where analysis seems to finish ok, but low_confidence.bed is empty and the Low-confident.Rate in out_final.Report is 0. We know for sure that is not the case (from quality scores) and as a parallel run on a different machine gave >100 low confidence points. The fasta file after breaks is also not produced. Would there be a chance to look into it? It is a 1.1Gb genome and we are using a 500Gb mem, 28 CPU machine, so resources should be plenty.