chanzuckerberg / shasta

[MOVED] Moved to paoloshasta/shasta. De novo assembly from Oxford Nanopore reads
Other
272 stars 59 forks source link

Runtime error "Does not begin with @" #215

Closed Meadowlion closed 3 years ago

Meadowlion commented 3 years ago

While using Shasta I have encountered a weird error : 2020-Dec-07 14:57:37.645184 A runtime error occurred in thread 1: Read at offset 310688 does not begin with "@".

Full Terminal Output: Options in use: Input files: /media/sb069/00AA8748AA8739641/Assembly in/fastq_runid_e5406a121f8f8a918c224c025253da8adea14d57_0_0.fastq assemblyDirectory = /media/sb069/00AA8748AA8739641/Assembly Out/TomTom2 memoryMode = filesystem memoryBacking = disk threadCount = 0

[Reads] minReadLength = 10000 desiredCoverage = 0 noCache = False palindromicReads.skipFlagging = False palindromicReads.maxSkip = 100 palindromicReads.maxDrift = 100 palindromicReads.maxMarkerFrequency = 10 palindromicReads.alignedFractionThreshold = 0.1 palindromicReads.nearDiagonalFractionThreshold = 0.1 palindromicReads.deltaThreshold = 100

[Kmers] generationMethod = 0 k = 10 probability = 0.1 enrichmentThreshold = 100 file =

[MinHash] version = 0 m = 4 hashFraction = 0.01 minHashIterationCount = 10 alignmentCandidatesPerRead = 20 minBucketSize = 0 maxBucketSize = 10 minFrequency = 2 allPairs = False

[Align] alignMethod = 3 maxSkip = 30 maxDrift = 30 maxTrim = 30 maxMarkerFrequency = 10 minAlignedMarkerCount = 100 minAlignedFraction = 0 matchScore = 6 mismatchScore = -1 gapScore = -1 downsamplingFactor = 0.1 bandExtend = 10 maxBand = 1000 sameChannelReadAlignment.suppressDeltaThreshold = 0 suppressContainments = False

[ReadGraph] creationMethod = 0 maxAlignmentCount = 6 minComponentSize = 100 maxChimericReadDistance = 2 crossStrandMaxDistance = 6 containedNeighborCount = 6 uncontainedNeighborCountPerDirection = 3 removeConflicts = False markerCountPercentile = 0.015 alignedFractionPercentile = 0.12 maxSkipPercentile = 0.12 maxDriftPercentile = 0.12 maxTrimPercentile = 0.015

[MarkerGraph] minCoverage = 10 maxCoverage = 100 minCoveragePerStrand = 0 lowCoverageThreshold = 0 highCoverageThreshold = 256 maxDistance = 30 edgeMarkerSkipThreshold = 100 pruneIterationCount = 6 simplifyMaxLength = 10,100,1000 crossEdgeCoverageThreshold = 0 refineThreshold = 0 reverseTransitiveReduction = False peakFinder.minAreaFraction = 0.08 peakFinder.areaStartIndex = 2

[Assembly] crossEdgeCoverageThreshold = 3 markerGraphEdgeLengthThresholdForConsensus = 1000 consensusCaller = Bayesian:guppy-2.3.5-a storeCoverageData = False storeCoverageDataCsvLengthThreshold = 0 writeReadsByAssembledSegment = False detangleMethod = 0 detangle.diagonalReadCountMin = 1 detangle.offDiagonalReadCountMax = 2 detangle.offDiagonalRatio = 0.3 iterative = False iterative.iterationCount = 3 iterative.pseudoPathAlignMatchScore = 1 iterative.pseudoPathAlignMismatchScore = -1 iterative.pseudoPathAlignGapScore = -1 iterative.mismatchSquareFactor = 3 iterative.minScore = 0 iterative.maxAlignmentCount = 6 iterative.bridgeRemovalIterationCount = 3 iterative.bridgeRemovalMaxDistance = 2

This assembly will use 8 threads. Setting up consensus caller Bayesian:guppy-2.3.5-a Using predefined Bayesian consensus caller guppy-2.3.5-a Bayesian consensus caller configuration name is Human guppy 2.3.5 chr1,chr2,chr3 GM24385 with hg38 priors and 1 pseudocounts 7-23-2019 2020-Dec-07 14:57:36.485658 Begin loading reads from 1 files. 2020-Dec-07 14:57:36.485671 Loading reads from /media/sb069/00AA8748AA8739641/Assembly in/fastq_runid_e5406a121f8f8a918c224c025253da8adea14d57_0_0.fastq File size: 2491885 bytes. Allocate buffer time: 1.11923 s. Read time: 0.000354944 s. Read rate: 7.0205e+09 bytes/s. Found 8000 lines in this file. 2020-Dec-07 14:57:37.645184 A runtime error occurred in thread 1: Read at offset 310688 does not begin with "@".

paoloczi commented 3 years ago

Please attach the first 800 KB of the fastq file. You can extract them using the following command:

head --bytes 800000 /media/sb069/00AA8748AA8739641/Assembly in/fastq_runid_e5406a121f8f8a918c224c025253da8adea14d57_0_0.fastq > subset.fastq

You can attach it in compressed form if you prefer. With this information I should be able to diagnose this problem.

paoloczi commented 3 years ago

The entire fastq file is only 2.5 MB, so you can attach the entire file if you prefer.

paoloczi commented 3 years ago

It is likely that this is a format error in the fastq file. Shasta only accepts standard fastq files where each read appears on exactly 4 lines. To confirm this I would need the fastq file. I am closing this for now, but please feel free to reopen it if you can post the fastq file or at least the relevant portion.