drivenbyentropy / aptasuite

A full-featured bioinformatics software collection for the comprehensive analysis of aptamers in HT-SELEX experiments.
https://drivenbyentropy.github.io/
GNU General Public License v3.0
24 stars 11 forks source link

aptacluster error #3

Closed ghost closed 6 years ago

ghost commented 6 years ago

Hi,

first, I think the example command line in the manual is wrong.. https://github.com/drivenbyentropy/aptasuite/wiki/Basic-Usage

it shows:

java -jar /path/to/aptacluster.jar

but should be?:

java -jar /path/to/aptasuite.jar

Using the following command, I get the error below:

java -jar ~/bin/aptasuite.jar -parse -structures -trace -cluster -export pool,cycles,clusters,structure -config config.txt

The parsing completes ok then:

Starting AptaCluster
Using existing data
Exception in thread "main" java.util.NoSuchElementException: Key 'Aptacluster.RandomizedRegionSize' does not map to an existing object!
        at org.apache.commons.configuration2.AbstractConfiguration.throwMissingPropertyException(AbstractConfiguration.java:1901)
        at org.apache.commons.configuration2.AbstractConfiguration.checkNonNullValue(AbstractConfiguration.java:1888)
        at org.apache.commons.configuration2.AbstractConfiguration.getInt(AbstractConfiguration.java:1252)
        at aptasuite.CLI.runAptaCluster(CLI.java:483)
        at aptasuite.CLI.<init>(CLI.java:177)
        at aptasuite.Aptasuite.main(Aptasuite.java:40)

My config file:

# Experiment configuration
Experiment.name = "SELEX against target Cp"
Experiment.description = "round R6 NEG R8T R83"

Experiment.projectPath = ./ 

Experiment.primer5 = GTGTCACCGCCCATTTCC
# OPTIONAL, only specify if the 3' primer was part of the sequenced data.
# If not specified, we need to specify the randomized region size
Experiment.primer3 = ACTGCGCCTCAGGATGGA

SelectionCycle.name = RoundR6
SelectionCycle.round = 6
SelectionCycle.isControlSelection = False
SelectionCycle.isCounterSelection = False

SelectionCycle.name = RoundNEG
SelectionCycle.round = 7
SelectionCycle.isControlSelection = False
SelectionCycle.isCounterSelection = True

SelectionCycle.name = Round8T
SelectionCycle.round = 8
SelectionCycle.isControlSelection = False
SelectionCycle.isCounterSelection = False

SelectionCycle.name = Round83
SelectionCycle.round = 9
SelectionCycle.isControlSelection = False
SelectionCycle.isCounterSelection = False

Performance.maxNumberOfCores = 12

# The default back-end for storing aptamer sequence information
AptamerPool.backend = MapDBAptamerPool

# The default back-end for storing the counts of each aptamer in a
# particular selection cycle
SelectionCycle.backend = MapDBSelectionCycle

# The default back-end storing the secondary structure information 
StructurePool.backend = MapDBStructurePool

AptaplexParser.isPerFile = True

AptaplexParser.forwardFiles = /home/theoa/d/059_minion_ben/clip/NEG_R1.fastq
AptaplexParser.forwardFiles = /home/theoa/d/059_minion_ben/clip/R6_R1.fastq
AptaplexParser.forwardFiles = /home/theoa/d/059_minion_ben/clip/R83_R1.fastq
AptaplexParser.forwardFiles = /home/theoa/d/059_minion_ben/clip/R8T_R1.fastq

AptaplexParser.reverseFiles = /home/theoa/d/059_minion_ben/clip/NEG_R2.fastq
AptaplexParser.reverseFiles = /home/theoa/d/059_minion_ben/clip/R6_R2.fastq
AptaplexParser.reverseFiles = /home/theoa/d/059_minion_ben/clip/R83_R2.fastq
AptaplexParser.reverseFiles = /home/theoa/d/059_minion_ben/clip/R8T_R2.fastq

# Specifies the reader for the sequences depending on the input format (case sensitive).
# Current options are: FastqReader, RawReader
AptaplexParser.reader = FastqReader

# For paired-end data only. The smallest overlap required between the forward and 
# reverse read when creating a single contig out of the two.
AptaplexParser.PairedEndMinOverlap = 15

# Maximal number of mutations in the overlapping region for a sequence to be accepted
AptaplexParser.PairedEndMaxMutations = 5

# Highest score of the current quality. 55 for phred model. 
AptaplexParser.PairedEndMaxScoreValue = 55

# Maximal number of mutations allowed in the primer regions
AptaplexParser.PrimerTolerance = 3 

Log file:

[3:55:19 | INFO | main]: utilities.Configuration
Reading configuration from file. 

[3:55:19 | INFO | main]: utilities.Configuration
Using the following parameters: 
Experiment.name : "SELEX against target Cp"
Experiment.description : "round R6 NEG R8T R83"
Experiment.projectPath : ./
Experiment.primer5 : GTGTCACCGCCCATTTCC
Experiment.primer3 : ACTGCGCCTCAGGATGGA
SelectionCycle.name : [RoundR6, RoundNEG, Round8T, Round83]
SelectionCycle.round : [6, 7, 8, 9]
SelectionCycle.isControlSelection : [False, False, False, False]
SelectionCycle.isCounterSelection : [False, True, False, False]
Performance.maxNumberOfCores : 12
AptamerPool.backend : MapDBAptamerPool
SelectionCycle.backend : MapDBSelectionCycle
StructurePool.backend : MapDBStructurePool
AptaplexParser.isPerFile : True
AptaplexParser.forwardFiles : [/home/theoa/d/059_minion_ben/clip/NEG_R1.fastq, /home/theoa/d/059_minion_ben/clip/R6_R1.fastq, /home/theoa/d/059_minion_ben/clip/R83_R1.fastq, /home/theoa/d/059_minion_ben/clip/R8T_R1.fastq]
AptaplexParser.reverseFiles : [/home/theoa/d/059_minion_ben/clip/NEG_R2.fastq, /home/theoa/d/059_minion_ben/clip/R6_R2.fastq, /home/theoa/d/059_minion_ben/clip/R83_R2.fastq, /home/theoa/d/059_minion_ben/clip/R8T_R2.fastq]
AptaplexParser.reader : FastqReader
AptaplexParser.PairedEndMinOverlap : 15
AptaplexParser.PairedEndMaxMutations : 5
AptaplexParser.PairedEndMaxScoreValue : 55
AptaplexParser.PrimerTolerance : 3
Aptasim.AmplificationEfficiency : 0.995
MapDBSelectionCycle.bloomFilterCollisionProbability : 0.001
Aptatrace.FilterClusters : true
Export.IncludePrimerRegions : true
Export.PoolCardinalityFormat : frequencies
Aptatrace.Alpha : 10
AptaplexParser.BlockingQueueSize : 500
ClusterContainer.backend : MapDBClusterContainer
MapDBAptamerPool.bloomFilterCollisionProbability : 0.001
Aptacluster.KmerSize : 3
Aptasim.NumberOfSeeds : 100
MapDBStructurePool.bloomFilterCollisionProbability : 0.001
Aptasim.NucleotideDistribution : [0.25, 0.25, 0.25, 0.25]
Aptasim.MinSeedAffinity : 80
Aptacluster.EditDistance : 5
Export.compress : true
Aptasim.NumberOfSequences : 1000000
Aptasim.BaseMutationRates : [0.25, 0.25, 0.25, 0.25]
Aptasim.HmmDegree : 2
Parser.backend : AptaplexParser
AptaplexParser.BarcodeTolerance : 1
MapDBAptamerPool.bloomFilterCapacity : 500000000
Export.MinimalClusterSize : 1
Aptasim.MaxSequenceCount : 10
MapDBAptamerPool.maxTreeMapCapacity : 1000000
Aptasim.MaxSequenceAffinity : 25
Aptasim.SelectionPercentage : 0.2
Aptasim.MutationProbability : 0.05
Export.SequenceFormat : fastq
Aptacluster.KmerCutoffIterations : 10000
Aptatrace.KmerLength : 6
Aptacluster.LSHIterations : 5
MapDBStructurePool.maxTreeMapCapacity : 500000
Aptasim.RandomizedRegionSize : 40
Aptatrace.OutputClusters : true

[3:55:19 | INFO | main]: aptasuite.CLI
If you use this software in your research, please cite AptaPLEX as Hoinka, J., & Przytycka, T. (2016). AptaPLEX - A dedicated, multithreaded demultiplexer for HT-SELEX data. Methods. http://doi.org/10.1016/j.ymeth.2016.04.011 

[3:55:19 | INFO | main]: aptasuite.CLI
Creating Database 

[3:55:19 | INFO | main]: lib.aptamer.datastructures.MapDBAptamerPool
Instantiating MapDBAptamerPool 

[3:55:19 | CONFIG | main]: lib.aptamer.datastructures.MapDBAptamerPool
Created new file ./pooldata/data0000.mapdb 

[3:55:19 | CONFIG | main]: lib.aptamer.datastructures.MapDBAptamerPool
Created new bounds file ./pooldata/bounds_data0000.mapdb 

[3:55:19 | CONFIG | main]: lib.aptamer.datastructures.MapDBAptamerPool
Created new inverse file ./pooldata/data_inverse.mapdb 

[3:55:19 | CONFIG | main]: lib.aptamer.datastructures.MapDBAptamerPool
AptamerPool instantiation took 0.661 seconds 

[3:55:20 | INFO | main]: lib.aptamer.datastructures.MapDBSelectionCycle
Processing selection cycle RoundR6 

[3:55:20 | CONFIG | main]: lib.aptamer.datastructures.MapDBSelectionCycle
Creating new file './cycledata/6_RoundR6.mapdb' for selection cycle RoundR6. 

[3:55:20 | INFO | main]: lib.aptamer.datastructures.MapDBSelectionCycle
Processing selection cycle RoundNEG 

[3:55:20 | CONFIG | main]: lib.aptamer.datastructures.MapDBSelectionCycle
Creating new file './cycledata/7_RoundNEG.mapdb' for selection cycle RoundNEG. 

[3:55:20 | INFO | main]: lib.aptamer.datastructures.MapDBSelectionCycle
Processing selection cycle Round8T 

[3:55:20 | CONFIG | main]: lib.aptamer.datastructures.MapDBSelectionCycle
Creating new file './cycledata/8_Round8T.mapdb' for selection cycle Round8T. 

[3:55:20 | INFO | main]: lib.aptamer.datastructures.MapDBSelectionCycle
Processing selection cycle Round83 

[3:55:20 | CONFIG | main]: lib.aptamer.datastructures.MapDBSelectionCycle
Creating new file './cycledata/9_Round83.mapdb' for selection cycle Round83. 

[3:55:20 | INFO | main]: aptasuite.CLI
Initializing Experiment 

[3:55:20 | INFO | main]: aptasuite.CLI
Experiment Setup
│
├── Round 0: N/A
│    │
│    ├─ Counter Selections: N/A
│    │
│    ├─ Control Selections: N/A
│
│
├── Round 1: N/A
│    │
│    ├─ Counter Selections: N/A
│    │
│    ├─ Control Selections: N/A
│
│
├── Round 2: N/A
│    │
│    ├─ Counter Selections: N/A
│    │
│    ├─ Control Selections: N/A
│
│
├── Round 3: N/A
│    │
│    ├─ Counter Selections: N/A
│    │
│    ├─ Control Selections: N/A
│
│
├── Round 4: N/A
│    │
│    ├─ Counter Selections: N/A
│    │
│    ├─ Control Selections: N/A
│
│
├── Round 5: N/A
│    │
│    ├─ Counter Selections: N/A
│    │
│    ├─ Control Selections: N/A
│
│
├── Round 6: RoundR6 (0)
│    │
│    ├─ Counter Selections: N/A
│    │
│    ├─ Control Selections: N/A
│
│
├── Round 7: N/A
│    │
│    ├─ Counter Selections:
│    │  │
│    │  └── RoundNEG (0)
│    │
│    ├─ Control Selections: N/A
│
│
├── Round 8: Round8T (0)
│    │
│    ├─ Counter Selections: N/A
│    │
│    ├─ Control Selections: N/A
│
│
└── Round 9: Round83 (0)
    │
    ├─ Counter Selections: N/A
    │
    ├─ Control Selections: N/A

[3:55:20 | INFO | main]: aptasuite.CLI
Initializing parser AptaplexParser 

[3:55:20 | INFO | main]: aptasuite.CLI
Starting AptaPlex: 

[3:55:20 | INFO | main]: aptasuite.CLI
Parsing... 

[3:55:20 | CONFIG | AptaPlex Producer]: lib.parser.aptaplex.FastqReader
Opened forward file in fastq format/home/theoa/d/059_minion_ben/clip/NEG_R1.fastq 

[3:55:20 | CONFIG | AptaPlex Producer]: lib.parser.aptaplex.FastqReader
Opened forward reverse in fastq format/home/theoa/d/059_minion_ben/clip/NEG_R2.fastq 

[3:56:03 | CONFIG | AptaPlex Producer]: lib.parser.aptaplex.FastqReader
Opened forward file in fastq format/home/theoa/d/059_minion_ben/clip/R6_R1.fastq 

[3:56:03 | CONFIG | AptaPlex Producer]: lib.parser.aptaplex.FastqReader
Opened forward reverse in fastq format/home/theoa/d/059_minion_ben/clip/R6_R2.fastq 

[3:56:37 | CONFIG | AptaPlex Producer]: lib.parser.aptaplex.FastqReader
Opened forward file in fastq format/home/theoa/d/059_minion_ben/clip/R83_R1.fastq 

[3:56:37 | CONFIG | AptaPlex Producer]: lib.parser.aptaplex.FastqReader
Opened forward reverse in fastq format/home/theoa/d/059_minion_ben/clip/R83_R2.fastq 

[3:57:10 | CONFIG | AptaPlex Producer]: lib.parser.aptaplex.FastqReader
Opened forward file in fastq format/home/theoa/d/059_minion_ben/clip/R8T_R1.fastq 

[3:57:10 | CONFIG | AptaPlex Producer]: lib.parser.aptaplex.FastqReader
Opened forward reverse in fastq format/home/theoa/d/059_minion_ben/clip/R8T_R2.fastq 

[3:57:45 | CONFIG | AptaPlex Producer]: lib.parser.aptaplex.AptaPlexProducer
Added poison pill to parsing queue 

[3:57:45 | CONFIG | AptaPlex Consumer 5]: lib.parser.aptaplex.AptaPlexConsumer
Encountered poison pill. Exiting thread. 

[3:57:45 | CONFIG | AptaPlex Consumer 8]: lib.parser.aptaplex.AptaPlexConsumer
Encountered poison pill. Exiting thread. 

[3:57:45 | CONFIG | AptaPlex Consumer 3]: lib.parser.aptaplex.AptaPlexConsumer
Encountered poison pill. Exiting thread. 

[3:57:45 | CONFIG | AptaPlex Consumer 6]: lib.parser.aptaplex.AptaPlexConsumer
Encountered poison pill. Exiting thread. 

[3:57:45 | CONFIG | AptaPlex Consumer 10]: lib.parser.aptaplex.AptaPlexConsumer
Encountered poison pill. Exiting thread. 

[3:57:45 | CONFIG | AptaPlex Consumer 2]: lib.parser.aptaplex.AptaPlexConsumer
Encountered poison pill. Exiting thread. 

[3:57:45 | CONFIG | AptaPlex Consumer 1]: lib.parser.aptaplex.AptaPlexConsumer
Encountered poison pill. Exiting thread. 

[3:57:45 | CONFIG | AptaPlex Consumer 7]: lib.parser.aptaplex.AptaPlexConsumer
Encountered poison pill. Exiting thread. 

[3:57:45 | CONFIG | AptaPlex Consumer 4]: lib.parser.aptaplex.AptaPlexConsumer
Encountered poison pill. Exiting thread. 

[3:57:45 | CONFIG | AptaPlex Consumer 9]: lib.parser.aptaplex.AptaPlexConsumer
Encountered poison pill. Exiting thread. 

[3:57:45 | CONFIG | AptaPlex Consumer 11]: lib.parser.aptaplex.AptaPlexConsumer
Encountered poison pill. Exiting thread. 

[3:57:45 | CONFIG | main]: lib.aptamer.datastructures.MapDBAptamerPool
Closing pool file handles. 

[3:57:45 | CONFIG | main]: lib.aptamer.datastructures.MapDBAptamerPool
Reopened as read only file ./pooldata/data0000.mapdb 

[3:57:45 | CONFIG | main]: lib.aptamer.datastructures.MapDBAptamerPool
Reopened as read only file ./pooldata/inverse_data0000.mapdb 

[3:57:45 | CONFIG | main]: lib.aptamer.datastructures.MapDBAptamerPool
Reopened as read only file ./pooldata/bounds_data0000.mapdb 

[3:57:45 | CONFIG | main]: lib.aptamer.datastructures.MapDBSelectionCycle
Reopened as read only file ./cycledata/6_RoundR6.mapdb 

[3:57:45 | CONFIG | main]: lib.aptamer.datastructures.MapDBSelectionCycle
Reopened as read only file ./cycledata/7_RoundNEG.mapdb 

[3:57:45 | CONFIG | main]: lib.aptamer.datastructures.MapDBSelectionCycle
Reopened as read only file ./cycledata/8_Round8T.mapdb 

[3:57:45 | CONFIG | main]: lib.aptamer.datastructures.MapDBSelectionCycle
Reopened as read only file ./cycledata/9_Round83.mapdb 

[3:57:45 | INFO | main]: aptasuite.CLI
Parsing Completed in 145.485 seconds.

[3:57:45 | INFO | main]: aptasuite.CLI
Selection Cycle Statistics 

[3:57:45 | INFO | main]: aptasuite.CLI
RoundR6 (106) 

[3:57:45 | INFO | main]: aptasuite.CLI
RoundNEG (81) 

[3:57:45 | INFO | main]: aptasuite.CLI
Round8T (592) 

[3:57:45 | INFO | main]: aptasuite.CLI
Round83 (144) 

[3:57:45 | INFO | main]: aptasuite.CLI
Starting Structure Predition 

[3:57:45 | INFO | main]: aptasuite.CLI
Using existing sequencing data 

[3:57:46 | INFO | main]: lib.aptamer.datastructures.MapDBStructurePool
Instantiating MapDBStructurePool 

[3:57:46 | CONFIG | main]: lib.aptamer.datastructures.MapDBStructurePool
Created new file ./structuredata/data0001.mapdb 

[3:57:46 | CONFIG | main]: lib.aptamer.datastructures.MapDBStructurePool
StructurePool instantiation took 0.073 seconds 

[3:57:46 | INFO | main]: aptasuite.CLI
Starting Structure Prediction using 12 threads: 

[3:57:46 | INFO | main]: aptasuite.CLI
Predicting... 

[3:57:49 | CONFIG | CapR Producer]: lib.structure.capr.CapRFactoryProducer
Added poison pill to CapR queue 

[3:57:53 | CONFIG | CapR Consumer 4]: lib.structure.capr.CapRFactoryConsumer
Encountered poison pill. Exiting thread. 

[3:57:54 | CONFIG | CapR Consumer 8]: lib.structure.capr.CapRFactoryConsumer
Encountered poison pill. Exiting thread. 

[3:57:54 | CONFIG | CapR Consumer 2]: lib.structure.capr.CapRFactoryConsumer
Encountered poison pill. Exiting thread. 

[3:57:54 | CONFIG | CapR Consumer 7]: lib.structure.capr.CapRFactoryConsumer
Encountered poison pill. Exiting thread. 

[3:57:54 | CONFIG | CapR Consumer 1]: lib.structure.capr.CapRFactoryConsumer
Encountered poison pill. Exiting thread. 

[3:57:54 | CONFIG | CapR Consumer 5]: lib.structure.capr.CapRFactoryConsumer
Encountered poison pill. Exiting thread. 

[3:57:54 | CONFIG | CapR Consumer 6]: lib.structure.capr.CapRFactoryConsumer
Encountered poison pill. Exiting thread. 

[3:57:54 | CONFIG | CapR Consumer 11]: lib.structure.capr.CapRFactoryConsumer
Encountered poison pill. Exiting thread. 

[3:57:54 | CONFIG | CapR Consumer 3]: lib.structure.capr.CapRFactoryConsumer
Encountered poison pill. Exiting thread. 

[3:57:54 | CONFIG | CapR Consumer 10]: lib.structure.capr.CapRFactoryConsumer
Encountered poison pill. Exiting thread. 

[3:57:54 | CONFIG | CapR Consumer 9]: lib.structure.capr.CapRFactoryConsumer
Encountered poison pill. Exiting thread. 

[3:57:55 | INFO | main]: aptasuite.CLI
Structure prediction completed in 9.014 seconds.

[3:57:55 | CONFIG | main]: lib.aptamer.datastructures.MapDBStructurePool
Reopened as read only file ./structuredata/data0000.mapdb 

[3:57:55 | INFO | main]: aptasuite.CLI
If you use this software in your research, please cite AptaCLUSTER as Hoinka, J., Berezhnoy, A., Sauna, Z. E., Gilboa, E., & Przytycka, T. M. (2014). AptaCluster - A method to cluster HT-SELEX aptamer pools and lessons from its application. In Lecture Notes in Computer Science  (Vol. 8394 LNBI, pp. 115–128). http://doi.org/10.1007/978-3-319-05269-4_9 

[3:57:55 | INFO | main]: aptasuite.CLI
Starting AptaCluster 

[3:57:55 | INFO | main]: aptasuite.CLI
Using existing data 

[3:57:55 | CONFIG | main]: lib.aptamer.datastructures.MapDBClusterContainer
Creating new file './clusterdata/clusters.mapdb' for cluster storage. 
drivenbyentropy commented 6 years ago

Hi,

Thank you for pointing out the typo in the usage description, I have corrected it accordingly.

As for the error, it looks like you are missing a parameter in your configuration file, namely

Aptacluster.RandomizedRegionSize

Please have a look at the AptaCluster Wiki under "Mandatory Configuration File Parameters" for further details.

Thanks!

ghost commented 6 years ago

Thanks. Seems to be working now. I have a couple of other questions.. First I have contacted your lead author about an apparent mistake in the SRA archive of reads related to aptasuite, please see message below. Second, can you explain the 'motif context trace' diagrams? I cannot find an explanation of the meaning of P, H, I etc.? I guess they represent changing frequency of the motif in some way.

Theo


Hi again,

I've noticed that the reason I didn't find any round 5 reads is that in the SRA information, the index sequence for round 2 and round 5 are the same but in SRX1653575 the sequence is called 'round 2' and the 'round 5 index is not given.

Round2: CGTGGGAGAGAGGAAGAGGGATGGG-N40-CGACGACTCGCTGAGATCGAGAATC

Can you please let me know the index used for round 5?

The reads appear to have the following forward indices:

TGAAG unknown GGGAG unknown CGTGG round 2/5 CCGGG unknown ATCCG unknown ACAGG round 3 GCCGG round 4 CATGG unknown AAAGG unknown

Thanks,

Theo


Dear Teresa,

I am attempting to utilise your SRA data as cited in the publication, " Large scale analysis of the mutational landscape in HT-SELEX improves aptamer discovery", Nucleic Acids Res. 2015 Jul 13; 43(12): 5699–5707, as a reference dataset for benchmarking aptamer design tools. However, using the barcode information contained in the SRA: https://www.ncbi.nlm.nih.gov/sra/SRX1653575[accn] I cannot find any sequences from round 5 of the SELEX as described in your paper(s). The data appears to be divided as follows:

SRR3279660.fastq; round 2= 4864618 reads; round 3 = 5314853; round 4 = 11381194 reads; round 5 = zero reads

SRR3279661.fastq; round 2 = 5699489 reads; round 3 = 8065185; round 4 = 5388503 reads; round 5 = zero reads

I used no errors in demuxing barcodes (usearch 10, fastx_demux command).

Also, the data appears to be single-ended, not paired as stated in the SRA.

Can you please advise?

Thanks for your help,

Theo Allnutt

Dr Theo Allnutt Bioinformatics Research Fellow School of Medicine, Faculty of Health Waurn Ponds Campus 0352479571

[cid:image005.png@01D34CB3.956E5810] [cube-small]

Deakin University Locked Bag 20000, Geelong, VIC 3220 +61 3 524 79571 theo.allnutt@deakin.edu.aumailto:theo.allnutt@deakin.edu.au http://www.deakin.edu.au/health/faculty-research/bioinformatics-core-research-group https://bioinformatics-deakin.github.io/portal1/

Deakin University CRICOS Provider Code 00113B

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

From: drivenbyentropy [mailto:notifications@github.com] Sent: Tuesday, 24 October 2017 5:55 AM To: drivenbyentropy/aptasuite aptasuite@noreply.github.com Cc: bioinformatics bioinformatics@deakin.edu.au; Author author@noreply.github.com Subject: Re: [drivenbyentropy/aptasuite] aptacluster error (#3)

Hi,

Thank you for pointing out the typo in the usage description, I have corrected it accordingly.

As for the error, it looks like you are missing a parameter in your configuration file, namely

Aptacluster.RandomizedRegionSize

Please have a look at the AptaCluster Wikihttps://github.com/drivenbyentropy/aptasuite/wiki/Clustering-aptamers-by-sequence-similarity under "Mandatory Configuration File Parameters" for further details.

Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/drivenbyentropy/aptasuite/issues/3#issuecomment-338761088, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATvGWVnaht54f91lCHIk5TGxl8dNWhXCks5svOEagaJpZM4QCTJk.

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

drivenbyentropy commented 6 years ago

AptaSuite is a re-implementation of a collection of algorithms and not the original implementation which are referenced in the corresponding papers. As such, the issue tracker is not the place to discuss the data accompanying the original manuscripts.

The motif context traces are a visual representation of how likely a particular motif is found in a secondary structure context in each of the selection cycles. Here, the contexts H,B,I,M,D, and P correspond to Hairpin, Bulge Loop, Inner Loop, Multiple Loop, Dangling End, and Paired. I have extended the wiki to state the meaning of these abbreviations in the appropriate sections.

ghost commented 6 years ago

Thanks for the explanation.

One further issue: I seem to be getting a very high rate of errors on my reverse primer, but in the data it looks fine. A log example is thus:

Total Reads: Accepted Reads: Contig Assembly Fails: Invalid Alphabet: 5' Primer Error: 3' Primer Error: Invalid Cycle: Total Primer Overlaps: 21907 52 0 0 0 21812 0 90734

I am trying to import data in fasta format that is already merged (contigs) and therefore single-ended. e.g.

R8T.7406324 TCACCGCCCATTTCCTGGGGGGGAAGGAATGTTATAGGTTTGGTAGGAGGACGGTTCCATCCTGAGGCGCAGT R8T.5935025 TCACCGCCCATTTCCCATCTCAGCCGCTGCGTTGGGTAGGGGAAAAAGGGGTGCATCCATCCTGAGGCGCAG R8T.5870928 TCACCGCCCATTTCCCATGCACGGGGGGGATGAATACGGACCTCGAGGGAGTGGGTGCATCCTGAGGCGCAGT R8T.5870929 TCACCGCCCATTTCCCATAGAGGGGGAAGGCAGTAGTGAAGCGGCGTGGCCTACAATCCATCCTGAGGCGCAGT R8T.5870927 TCACCTCCCATTTCCGGGGGAGACAAAAGAGTGCTCCTGGGGTTCCGTCCATGGGTCCATCCTGAGGCGCAGT R8T.5870924 TCACCGCCCATTTCCGCGAGGGTCGGCATGATCGCAGCGGGGGAGTAAGGCTCCGTCCATCCTGAGGCGCAGT R8T.5935024 TCACCGCCCATTTCCAACGAACCGCGCTTAAGATTCCTTACCGTTCCAGTATCGCTCCATCCTGAGGCGCAGT R8T.5870923 TCACCGCCCATTTCCACGTCGGAGGGGGACAGGGTTAGATTTAATTAGGGCCAGTCCATCCTGAGGCGCAGT

N.B. I always get some 1-2 bp wobble in the aptamer read length in real data.

The config file is below. As you can see, the reverse primer is always present, but the parsing seems to think there is an error?

Thanks,

Theo

Experiment configuration

Experiment.name = "SELEX against target Cp" Experiment.description = "round R6 NEG R8T R83" Experiment.projectPath = ./

Experiment.primer5 = TCACCGCCCATTTCC

OPTIONAL, only specify if the 3' primer was part of the sequenced data.

If not specified, we need to specify the randomized region size

Experiment.primer3 = TGCGCCTCAGGATGGA

SelectionCycle.name = R6 SelectionCycle.round = 6 SelectionCycle.isControlSelection = False SelectionCycle.isCounterSelection = False

SelectionCycle.name = NEG SelectionCycle.round = 7 SelectionCycle.isControlSelection = False SelectionCycle.isCounterSelection = True

SelectionCycle.name = 8T SelectionCycle.round = 8 SelectionCycle.isControlSelection = False SelectionCycle.isCounterSelection = False

SelectionCycle.name = 83 SelectionCycle.round = 9 SelectionCycle.isControlSelection = False SelectionCycle.isCounterSelection = False

Performance.maxNumberOfCores = 12

The default back-end for storing aptamer sequence information

AptamerPool.backend = MapDBAptamerPool

The default back-end for storing the counts of each aptamer in a

particular selection cycle

SelectionCycle.backend = MapDBSelectionCycle

The default back-end storing the secondary structure information

StructurePool.backend = MapDBStructurePool

AptaplexParser.isPerFile = True

AptaplexParser.forwardFiles = /home/theoa/d/068_aptamer_benchmarking/reads/NEG.fasta AptaplexParser.forwardFiles = /home/theoa/d/068_aptamer_benchmarking/reads/R6.fasta AptaplexParser.forwardFiles = /home/theoa/d/068_aptamer_benchmarking/reads/R83.fasta AptaplexParser.forwardFiles = /home/theoa/d/068_aptamer_benchmarking/reads/R8T.fasta

Specifies the reader for the sequences depending on the input format (case sensitive).

Current options are: FastqReader, RawReader

AptaplexParser.reader = FastqReader

For paired-end data only. The smallest overlap required between the forward and

reverse read when creating a single contig out of the two.

AptaplexParser.PairedEndMinOverlap = 15

Maximal number of mutations in the overlapping region for a sequence to be accepted

AptaplexParser.PairedEndMaxMutations = 10

Highest score of the current quality. 55 for phred model.

AptaplexParser.PairedEndMaxScoreValue = 55

Maximal number of mutations allowed in the primer regions

AptaplexParser.PrimerTolerance = 5

Aptacluster.RandomizedRegionSize = 40

The size of the locality sensitive hash dimension. It defines how many indices from the

randomized region will be sampled during the process. Must be smaller or equal to

Aptacluster.RandomizedRegionSize

Aptacluster.LSHDimension = 30

Defines the size of the k-mers that will be used during the motif

extraction procedure of AptaTRACE. In other words, it defines the initial motif

size

Aptatrace.KmerLength = 6

Occasionally, motifs might co-occur within the same aptamer or aptamer family.

In order to better understand this relationship, we have developed a post-processing

add-on that uncovers these relationships. To activate this option, the this parameter

has to be set to True.

Aptatrace.FilterClusters = True

If, in addition to the motifs, a list of all aptamers that contain the motif are to

be saved in a separate file, set this parameter to true.

Aptatrace.OutputClusters = True

AptaTRACE uses a background model to identify statistically significant changes

in secondary structure contexts. This model is generated from aptamers which do

not undergo selection and are therefore present in small numbers in the pools.

The parameter alpha specifies which sequences should be included in the background

model, i.e. all sequences whose number of occurrences is smaller than, or equal

to this value are taken into account.

Aptatrace.Alpha = 10

Dr Theo Allnutt Bioinformatics Research Fellow School of Medicine, Faculty of Health Waurn Ponds Campus 0352479571

[cid:image001.png@01D34CBC.3A677E20] [cube-small]

Deakin University Locked Bag 20000, Geelong, VIC 3220 +61 3 524 79571 theo.allnutt@deakin.edu.aumailto:theo.allnutt@deakin.edu.au http://www.deakin.edu.au/health/faculty-research/bioinformatics-core-research-group https://bioinformatics-deakin.github.io/portal1/

Deakin University CRICOS Provider Code 00113B

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

From: drivenbyentropy [mailto:notifications@github.com] Sent: Tuesday, 24 October 2017 11:26 AM To: drivenbyentropy/aptasuite aptasuite@noreply.github.com Cc: bioinformatics bioinformatics@deakin.edu.au; Author author@noreply.github.com Subject: Re: [drivenbyentropy/aptasuite] aptacluster error (#3)

AptaSuite is a re-implementation of a collection of algorithms and not the original implementation which are referenced in the corresponding papers. As such, the issue tracker is not the place to discuss the data accompanying the original manuscripts.

As for the he motif context traces, they are a visual representation of how likely a particular motif is found in a secondary structure context in each of the selection cycles. Here, the contexts H,B,I,M,D, and P correspond to Hairpin, Bulge Loop, Inner Loop, Multiple Loop, Dangling End, and Paired. I have extended the wiki to state the meaning of these abbreviations in the appropriate sections.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/drivenbyentropy/aptasuite/issues/3#issuecomment-338834703, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATvGWSxirTE5OZSwMGYZdaibo1IcGnetks5svS6ugaJpZM4QCTJk.

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

ghost commented 6 years ago

Also getting this error during parsing:

Initializing parser AptaplexParser Starting AptaPlex: Parsing... Total Reads: Accepted Reads: Contig Assembly Fails: Invalid Alphabet: 5' Primer Error: 3' Primer Error: Invalid Cycle: Total Primer Overlaps: 0 0 0 0 0 0 0 21914 52 0 0 0 21816 0 79072 217 0 0 1 78682 0 138808 406 0 0 5 138116 0 200479 607 0 0 5 199477 0 262090 786 0 0 7 260793 0 309529 963 0 0 7 307978 0 369137 1131 0 0 7 367292 0 437344 1351 0 0 9 435136 0 517846 1581 0 0 10 515249 0 579204 1748 0 0 11 576304 0 Exception in thread "AptaPlex Producer" java.lang.NullPointerException at lib.parser.aptaplex.FastqReader.getNextRead(FastqReader.java:128) at lib.parser.aptaplex.AptaPlexProducer.run(AptaPlexProducer.java:138) at java.lang.Thread.run(Thread.java:745) 592775 1796 0 0 11 589806 0 592775 1796 0 0 11 589806 0 592775 1796 0 0 11 589806 0 ^Cjava.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1249) at java.lang.Thread.join(Thread.java:1323) at lib.parser.aptaplex.AptaPlexParser.parse(AptaPlexParser.java:56) at lib.parser.aptaplex.AptaPlexParser.run(AptaPlexParser.java:84) at java.lang.Thread.run(Thread.java:745)

Dr Theo Allnutt Bioinformatics Research Fellow School of Medicine, Faculty of Health Waurn Ponds Campus 0352479571

[cid:image001.png@01D34CBD.A553D160] [cube-small]

Deakin University Locked Bag 20000, Geelong, VIC 3220 +61 3 524 79571 theo.allnutt@deakin.edu.aumailto:theo.allnutt@deakin.edu.au http://www.deakin.edu.au/health/faculty-research/bioinformatics-core-research-group https://bioinformatics-deakin.github.io/portal1/

Deakin University CRICOS Provider Code 00113B

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

From: drivenbyentropy [mailto:notifications@github.com] Sent: Tuesday, 24 October 2017 11:26 AM To: drivenbyentropy/aptasuite aptasuite@noreply.github.com Cc: bioinformatics bioinformatics@deakin.edu.au; Author author@noreply.github.com Subject: Re: [drivenbyentropy/aptasuite] aptacluster error (#3)

AptaSuite is a re-implementation of a collection of algorithms and not the original implementation which are referenced in the corresponding papers. As such, the issue tracker is not the place to discuss the data accompanying the original manuscripts.

As for the he motif context traces, they are a visual representation of how likely a particular motif is found in a secondary structure context in each of the selection cycles. Here, the contexts H,B,I,M,D, and P correspond to Hairpin, Bulge Loop, Inner Loop, Multiple Loop, Dangling End, and Paired. I have extended the wiki to state the meaning of these abbreviations in the appropriate sections.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/drivenbyentropy/aptasuite/issues/3#issuecomment-338834703, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATvGWSxirTE5OZSwMGYZdaibo1IcGnetks5svS6ugaJpZM4QCTJk.

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

drivenbyentropy commented 6 years ago

Hi,

Based on the file extension of your sequence files, they appear to be in fasta format. AptaSUITE currently only supports fastq format as input (I will add fasta support soon).

If this does not solve the issue, please open another ticket as this problem is unrelated to the original question.

Thanks!

ghost commented 6 years ago

ok I will open another issue. I tried fastq too and same result.

Dr Theo Allnutt Bioinformatics Research Fellow School of Medicine, Faculty of Health Waurn Ponds Campus 0352479571

[cid:image001.png@01D34D7F.AD20F540] [cube-small]

Deakin University Locked Bag 20000, Geelong, VIC 3220 +61 3 524 79571 theo.allnutt@deakin.edu.aumailto:theo.allnutt@deakin.edu.au http://www.deakin.edu.au/health/faculty-research/bioinformatics-core-research-group https://bioinformatics-deakin.github.io/portal1/

Deakin University CRICOS Provider Code 00113B

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

From: drivenbyentropy [mailto:notifications@github.com] Sent: Wednesday, 25 October 2017 4:29 AM To: drivenbyentropy/aptasuite aptasuite@noreply.github.com Cc: bioinformatics bioinformatics@deakin.edu.au; Author author@noreply.github.com Subject: Re: [drivenbyentropy/aptasuite] aptacluster error (#3)

Hi,

Based on the file extension of your sequence files, they appear to be in fasta format. AptaSUITE currently only supports fastq format as input (I will add fasta support soon).

If this does not solve the issue, please open another ticket as this problem is unrelated to the original question.

Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/drivenbyentropy/aptasuite/issues/3#issuecomment-339068785, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATvGWWooStYhhe9cG4Gu7lj2C3RuKAl3ks5svh4-gaJpZM4QCTJk.

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.