ExaScience / elprep

elPrep: a high-performance tool for analyzing sequence alignment/map files in sequencing pipelines.
Other
286 stars 42 forks source link

elfasta-conversion error: bufio.Scanner: token too long #68

Open desmodus1984 opened 1 year ago

desmodus1984 commented 1 year ago

Hi, I am trying to run elprep for variant calling, and I am trying first to convert the reference into elfasta format. I have tried it in too different server, one with the up-to-date version, and other with an older 5.X version and the code elprep fasta-to-elfasta myse_Super_myo_final.fa myse_Super_myo_final.elfasta

gives me the following error: 2023/02/28 15:07:09 Created log file at /users/PHS0338/jpac1984/logs/elprep/elprep-2023-02-28-15-07-09-560735636-EST.log 2023/02/28 15:07:09 Command line: [elprep fasta-to-elfasta myse_Super_myo_final.fa myse_Super_myo_final.elfasta] 2023/02/28 15:07:09 bufio.Scanner: token too long

I have tried running elprep within conda and with the latest release and both times I got the same error.

Hope this error can be fixed soon.

Thanks;

caherzee commented 1 year ago

Hi,

Is it possible to share the contents of the log file?

How large is the file you are trying to convert to elfasta? (myse_Super_myo_final.fa)

Thanks.

Charlotte

On 28 Feb 2023, at 21:14, desmodus1984 @.**@.>> wrote:

Hi, I am trying to run elprep for variant calling, and I am trying first to convert the reference into elfasta format. I have tried it in too different server, one with the up-to-date version, and other with an older 5.X version and the code elprep fasta-to-elfasta myse_Super_myo_final.fa myse_Super_myo_final.elfasta

gives me the following error: 2023/02/28 15:07:09 Created log file at /users/PHS0338/jpac1984/logs/elprep/elprep-2023-02-28-15-07-09-560735636-EST.log 2023/02/28 15:07:09 Command line: [elprep fasta-to-elfasta myse_Super_myo_final.fa myse_Super_myo_final.elfasta] 2023/02/28 15:07:09 bufio.Scanner: token too long

I have tried running elprep within conda and with the latest release and both times I got the same error.

Hope this error can be fixed soon.

Thanks;

— Reply to this email directly, view it on GitHubhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FExaScience%2Felprep%2Fissues%2F68&data=05%7C01%7Ccharlotte.herzeel%40imec.be%7C175a97e3758d4af2e2d208db19c85685%7Ca72d5a7225ee40f09bd1067cb5b770d4%7C0%7C0%7C638132120461209090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gzU0U9U1cqC8pyFtrrWAliRL%2F%2BHy6hmBo6%2BD49gnMP4%3D&reserved=0, or unsubscribehttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABPESUWRLVPTZJLVOY4NR4TWZZMAVANCNFSM6AAAAAAVLFWPVM&data=05%7C01%7Ccharlotte.herzeel%40imec.be%7C175a97e3758d4af2e2d208db19c85685%7Ca72d5a7225ee40f09bd1067cb5b770d4%7C0%7C0%7C638132120461209090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=IfgREb9EbMOe6yDGi7EW%2F1h9gboBA8%2FIUWYLog9XfE8%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

desmodus1984 commented 1 year ago

Hi Charlotte,

The log file is basically this:

elprep version 5.1.3 compiled with go1.17.6 - see http://github.com/exascience/elprep for more information.2023/03/01 12:06:04 Created log file at /users/PHS0338/jpac1984/logs/elprep/elprep-2023-03-01-12-06-04-871312972-EST.log2023/03/01 12:06:04 Command line: [elprep fasta-to-elfasta ragtag.356.fasta ragtag.356.elfasta]2023/03/01 12:06:04 bufio.Scanner: token too long This I am trying to convert now is 113Mb. The previous I tried to convert - which I could before and now I can't- is 1.9 GB.

The code I used was this: elprep fasta-to-elfasta ragtag.356.fasta ragtag.356.elfasta

Juan Pablo Aguilar Cabezas

Ecology and Evolutionary Biology Ph.D. Candidate

Department of Biological Sciences

Ohio University, Athens OH


From: Charlotte Herzeel @.> Sent: Wednesday, March 1, 2023 9:14 AM To: ExaScience/elprep @.> Cc: Aguilar Cabezas, Juan Pablo @.>; Author @.> Subject: [External] Re: [ExaScience/elprep] elfasta-conversion error: bufio.Scanner: token too long (Issue #68)

Use caution with links and attachments.

Hi,

Is it possible to share the contents of the log file?

How large is the file you are trying to convert to elfasta? (myse_Super_myo_final.fa)

Thanks.

Charlotte

On 28 Feb 2023, at 21:14, desmodus1984 @.**@.>> wrote:

Hi, I am trying to run elprep for variant calling, and I am trying first to convert the reference into elfasta format. I have tried it in too different server, one with the up-to-date version, and other with an older 5.X version and the code elprep fasta-to-elfasta myse_Super_myo_final.fa myse_Super_myo_final.elfasta

gives me the following error: 2023/02/28 15:07:09 Created log file at /users/PHS0338/jpac1984/logs/elprep/elprep-2023-02-28-15-07-09-560735636-EST.log 2023/02/28 15:07:09 Command line: [elprep fasta-to-elfasta myse_Super_myo_final.fa myse_Super_myo_final.elfasta] 2023/02/28 15:07:09 bufio.Scanner: token too long

I have tried running elprep within conda and with the latest release and both times I got the same error.

Hope this error can be fixed soon.

Thanks;

— Reply to this email directly, view it on GitHubhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FExaScience%2Felprep%2Fissues%2F68&data=05%7C01%7Ccharlotte.herzeel%40imec.be%7C175a97e3758d4af2e2d208db19c85685%7Ca72d5a7225ee40f09bd1067cb5b770d4%7C0%7C0%7C638132120461209090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gzU0U9U1cqC8pyFtrrWAliRL%2F%2BHy6hmBo6%2BD49gnMP4%3D&reserved=0, or unsubscribehttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABPESUWRLVPTZJLVOY4NR4TWZZMAVANCNFSM6AAAAAAVLFWPVM&data=05%7C01%7Ccharlotte.herzeel%40imec.be%7C175a97e3758d4af2e2d208db19c85685%7Ca72d5a7225ee40f09bd1067cb5b770d4%7C0%7C0%7C638132120461209090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=IfgREb9EbMOe6yDGi7EW%2F1h9gboBA8%2FIUWYLog9XfE8%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FExaScience%2Felprep%2Fissues%2F68%23issuecomment-1450222159&data=05%7C01%7Cja569116%40ohio.edu%7C4c494b22b36a47b7ad0c08db1a5f56ab%7Cf3308007477c4a70888934611817c55a%7C0%7C0%7C638132769009152211%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xQSOL%2Baiw6pLwUX9duFKWEUXFYrN0%2FxYuwE9XcRX%2ByY%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAJWD2VNNSAUXIOIEOCBJ3U3WZ5KWDANCNFSM6AAAAAAVLFWPVM&data=05%7C01%7Cja569116%40ohio.edu%7C4c494b22b36a47b7ad0c08db1a5f56ab%7Cf3308007477c4a70888934611817c55a%7C0%7C0%7C638132769009152211%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=q0zj0FXMU%2B4YsVGy1S%2FS6GYrE0HJozCX91a7asezGa8%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

Banthandor commented 1 year ago

I've got the same issue:

The reference file is about 750MB

This is all the log reveals:

2023/04/26 11:26:54 Command line: [elprep fasta-to-elfasta /kyukon/scratch/gent/vo/001/gvo00126/vsc44381/refgenomes/bzrefgenome/filtered.asm.cns.fna /kyukon/scratch/gent/vo/001/gvo00126/vsc44381/refgenomes/bzrefgenome/filtered.asm.cns.elfasta] 2023/04/26 11:26:54 bufio.Scanner: token too long

Banthandor commented 1 year ago

I have solved the issue by limiting the number of characters per line to 80, like my other references:

seqkit seq reference.fasta -w 80

The output fasta file should be accepted by elprep fasta-to-elfasta

showhey0119 commented 1 year ago

Hi all,

I got the same error message when I used elprep vcf-to-elsites. My VCF file contains 1,136 samples; the longest line is over 170,000 characters. I removed the lines with over 60,000 characters from the vcf file, and elprep successfully ran.

VCF files cannot be inserted with line breaks, unlike fasta files. Hope this error can be fixed.

Error log is: elprep version 5.1.3 compiled with go1.17.6 - see http://github.com/exascience/elprep for more information.

2023/06/13 08:32:00 Created log file at /home/stakuno/logs/elprep/elprep-2023-06-13-08-32-00-557298385-JST.log 2023/06/13 08:32:00 Command line: [elprep vcf-to-elsites ./vcf1/all_indel.vcf ./vcf1/all_indel.elsites] 2023/06/13 08:32:00 bufio.Scanner: token too long panic: bufio.Scanner: token too long

goroutine 1 [running]: log.Panic({0xc000129c90, 0xc000129ca0, 0x66c580}) /opt/conda/conda-bld/elprep_1651164620400/_build_env/go/src/log/log.go:354 +0x65 github.com/exascience/elprep/v5/internal.RunPipeline(0xc000228300) /opt/conda/conda-bld/elprep_1651164620400/work/internal/misc.go:34 +0x55 github.com/exascience/elprep/v5/intervals.FromVcfFile({0x7ffcc1fe023a, 0x18}) /opt/conda/conda-bld/elprep_1651164620400/work/intervals/intervals.go:336 +0x432 github.com/exascience/elprep/v5/cmd.VcfToElsites() /opt/conda/conda-bld/elprep_1651164620400/work/cmd/convert.go:47 +0x1a5 main.main() /opt/conda/conda-bld/elprep_1651164620400/work/main.go:75 +0x317

reesea22 commented 6 months ago

Hello. I have run into a similar error.

$ which go && go version ~/.conda/envs/WES_dev2/bin/go go version go1.17.6 linux/amd64

$ which elprep && elprep ~/.conda/envs/WES_dev2/bin/elprep elprep version 5.1.3 compiled with go1.17.6

2024/03/21 10:44:31 Created log file at /home/areese/logs/elprep/elprep-2024-03-21-10-44-31-633572959-EDT.log 2024/03/21 10:44:31 Command line: [elprep sfm /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/debug.reads_to_hg38.P14.sam /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/HTB-2D-TR1_pilotWES_reads_to_hg38.P14.processed.bam --output-type bam --replace-read-group ID:group1 LB:1 PL:illumina PU:unit1 SM:sample --mark-duplicates --mark-optical-duplicates /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/reads_to_hg38.P14.markedDuplicatesMetrics.txt --remove-duplicates --sorting-order coordinate --bqsr /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/bqsr_recalibration.tbl --known-sites /home/shared/databases/dbsnp/Homo_sapiens_assembly38.dbsnp138.elsites --target-regions /home/shared/projects/WES/hg38_Twist_ILMN_Exome_2.0_Plus_Panel_Combined_Mito.UCSC.bed --reference /home/shared/projects/RNA-seq/references/UCSC/latest/hg38.P14.elfasta --haplotypecaller /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/HTB-2D-TR1_pilotWES_reads_to_hg38.P14.vcf --timed] 2024/03/21 10:44:31 Executing command: elprep sfm /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/debug.reads_to_hg38.P14.sam /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/HTB-2D-TR1_pilotWES_reads_to_hg38.P14.processed.bam --output-type bam --replace-read-group ID:group1 LB:1 PL:illumina PU:unit1 SM:sample --mark-duplicates --mark-optical-duplicates /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/reads_to_hg38.P14.markedDuplicatesMetrics.txt --optical-duplicates-pixel-distance 100 --remove-duplicates --bqsr /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/bqsr_recalibration.tbl --reference /home/shared/projects/RNA-seq/references/UCSC/latest/hg38.P14.elfasta --quantize-levels 0 --max-cycle 500 --known-sites /home/shared/databases/dbsnp/Homo_sapiens_assembly38.dbsnp138.elsites --target-regions /home/shared/projects/WES/hg38_Twist_ILMN_Exome_2.0_Plus_Panel_Combined_Mito.UCSC.bed --haplotypecaller /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/HTB-2D-TR1_pilotWES_reads_to_hg38.P14.vcf --sorting-order coordinate --timed --intermediate-files-output-prefix debug.reads_to_hg38.P14 --intermediate-files-output-type sam 2024/03/21 10:44:31 Splitting... 2024/03/21 10:53:23 Filtering (phase 1)... 2024/03/21 10:53:24 exit status 2

2024/03/21 10:53:23 Created log file at /home/areese/logs/elprep/elprep-2024-03-21-10-53-23-776329743-EDT.log 2024/03/21 10:53:23 Command line: [elprep filter /home/shared/repos/RNA-seq_develop/elprep-splits-f58c8369-975d-439f-8426-5b564ed24d3b/splits/debug.reads_to_hg38.P14-unmapped.sam /home/shared/repos/RNA-seq_develop/elprep-splits-processed-f58c8369-975d-439f-8426-5b564ed24d3b/debug.reads_to_hg38.P14-unmapped.sam --replace-read-group ID:group1 LB:1 PL:illumina PU:unit1 SM:sample --mark-duplicates --remove-duplicates --reference /home/shared/projects/RNA-seq/references/UCSC/latest/hg38.P14.elfasta --max-cycle 500 --known-sites /home/shared/databases/dbsnp/Homo_sapiens_assembly38.dbsnp138.elsites --sorting-order coordinate --timed --pg-cmd-line elprep sfm /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/debug.reads_to_hg38.P14.sam /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/HTB-2D-TR1_pilotWES_reads_to_hg38.P14.processed.bam --output-type bam --replace-read-group ID:group1 LB:1 PL:illumina PU:unit1 SM:sample --mark-duplicates --mark-optical-duplicates /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/reads_to_hg38.P14.markedDuplicatesMetrics.txt --optical-duplicates-pixel-distance 100 --remove-duplicates --bqsr /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/bqsr_recalibration.tbl --reference /home/shared/projects/RNA-seq/references/UCSC/latest/hg38.P14.elfasta --quantize-levels 0 --max-cycle 500 --known-sites /home/shared/databases/dbsnp/Homo_sapiens_assembly38.dbsnp138.elsites --target-regions /home/shared/projects/WES/hg38_Twist_ILMN_Exome_2.0_Plus_Panel_Combined_Mito.UCSC.bed --haplotypecaller /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/HTB-2D-TR1_pilotWES_reads_to_hg38.P14.vcf --sorting-order coordinate --timed --intermediate-files-output-prefix debug.reads_to_hg38.P14 --intermediate-files-output-type sam --bqsr-tables-only /home/shared/repos/RNA-seq_develop/elprep-tabs-f58c8369-975d-439f-8426-5b564ed24d3b/debug.reads_to_hg38.P14-unmapped.sam.elrecal --mark-optical-duplicates-intermediate /home/shared/repos/RNA-seq_develop/elprep-metrics-f58c8369-975d-439f-8426-5b564ed24d3b/debug.reads_to_hg38.P14-unmapped.sam --optical-duplicates-pixel-distance 100 --target-regions /home/shared/projects/WES/hg38_Twist_ILMN_Exome_2.0_Plus_Panel_Combined_Mito.UCSC.bed] 2024/03/21 10:53:24 bufio.Scanner: token too long panic: bufio.Scanner: token too long

goroutine 1 [running]: log.Panic({0xc0001c5458, 0xc002331890, 0xc002333750}) /opt/conda/conda-bld/elprep_1651164620400/_build_env/go/src/log/log.go:354 +0x65 github.com/exascience/elprep/v5/bed.ParseBed({0x7ffd3bb9ba8c, 0xc00011e030}) /opt/conda/conda-bld/elprep_1651164620400/work/bed/bed-files.go:57 +0x56b github.com/exascience/elprep/v5/cmd.Filter() /opt/conda/conda-bld/elprep_1651164620400/work/cmd/filter.go:738 +0x353f main.main() /opt/conda/conda-bld/elprep_1651164620400/work/main.go:67 +0x245

I did not have issues with either the vcf-to-elsites command, nor the fasta-to-elfasta commands. Thanks.