Closed ghost closed 6 years ago
Hi,
Thank you for pointing out the typo in the usage description, I have corrected it accordingly.
As for the error, it looks like you are missing a parameter in your configuration file, namely
Aptacluster.RandomizedRegionSize
Please have a look at the AptaCluster Wiki under "Mandatory Configuration File Parameters" for further details.
Thanks!
Thanks. Seems to be working now. I have a couple of other questions.. First I have contacted your lead author about an apparent mistake in the SRA archive of reads related to aptasuite, please see message below. Second, can you explain the 'motif context trace' diagrams? I cannot find an explanation of the meaning of P, H, I etc.? I guess they represent changing frequency of the motif in some way.
Theo
Hi again,
I've noticed that the reason I didn't find any round 5 reads is that in the SRA information, the index sequence for round 2 and round 5 are the same but in SRX1653575 the sequence is called 'round 2' and the 'round 5 index is not given.
Round2: CGTGGGAGAGAGGAAGAGGGATGGG-N40-CGACGACTCGCTGAGATCGAGAATC
Can you please let me know the index used for round 5?
The reads appear to have the following forward indices:
TGAAG unknown GGGAG unknown CGTGG round 2/5 CCGGG unknown ATCCG unknown ACAGG round 3 GCCGG round 4 CATGG unknown AAAGG unknown
Thanks,
Theo
Dear Teresa,
I am attempting to utilise your SRA data as cited in the publication, " Large scale analysis of the mutational landscape in HT-SELEX improves aptamer discovery", Nucleic Acids Res. 2015 Jul 13; 43(12): 5699–5707, as a reference dataset for benchmarking aptamer design tools. However, using the barcode information contained in the SRA: https://www.ncbi.nlm.nih.gov/sra/SRX1653575[accn] I cannot find any sequences from round 5 of the SELEX as described in your paper(s). The data appears to be divided as follows:
SRR3279660.fastq; round 2= 4864618 reads; round 3 = 5314853; round 4 = 11381194 reads; round 5 = zero reads
SRR3279661.fastq; round 2 = 5699489 reads; round 3 = 8065185; round 4 = 5388503 reads; round 5 = zero reads
I used no errors in demuxing barcodes (usearch 10, fastx_demux command).
Also, the data appears to be single-ended, not paired as stated in the SRA.
Can you please advise?
Thanks for your help,
Theo Allnutt
Dr Theo Allnutt Bioinformatics Research Fellow School of Medicine, Faculty of Health Waurn Ponds Campus 0352479571
[cid:image005.png@01D34CB3.956E5810] [cube-small]
Deakin University Locked Bag 20000, Geelong, VIC 3220 +61 3 524 79571 theo.allnutt@deakin.edu.aumailto:theo.allnutt@deakin.edu.au http://www.deakin.edu.au/health/faculty-research/bioinformatics-core-research-group https://bioinformatics-deakin.github.io/portal1/
Deakin University CRICOS Provider Code 00113B
Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus free.
From: drivenbyentropy [mailto:notifications@github.com] Sent: Tuesday, 24 October 2017 5:55 AM To: drivenbyentropy/aptasuite aptasuite@noreply.github.com Cc: bioinformatics bioinformatics@deakin.edu.au; Author author@noreply.github.com Subject: Re: [drivenbyentropy/aptasuite] aptacluster error (#3)
Hi,
Thank you for pointing out the typo in the usage description, I have corrected it accordingly.
As for the error, it looks like you are missing a parameter in your configuration file, namely
Aptacluster.RandomizedRegionSize
Please have a look at the AptaCluster Wikihttps://github.com/drivenbyentropy/aptasuite/wiki/Clustering-aptamers-by-sequence-similarity under "Mandatory Configuration File Parameters" for further details.
Thanks!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/drivenbyentropy/aptasuite/issues/3#issuecomment-338761088, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATvGWVnaht54f91lCHIk5TGxl8dNWhXCks5svOEagaJpZM4QCTJk.
Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus free.
AptaSuite is a re-implementation of a collection of algorithms and not the original implementation which are referenced in the corresponding papers. As such, the issue tracker is not the place to discuss the data accompanying the original manuscripts.
The motif context traces are a visual representation of how likely a particular motif is found in a secondary structure context in each of the selection cycles. Here, the contexts H,B,I,M,D, and P correspond to Hairpin, Bulge Loop, Inner Loop, Multiple Loop, Dangling End, and Paired. I have extended the wiki to state the meaning of these abbreviations in the appropriate sections.
Thanks for the explanation.
One further issue: I seem to be getting a very high rate of errors on my reverse primer, but in the data it looks fine. A log example is thus:
Total Reads: Accepted Reads: Contig Assembly Fails: Invalid Alphabet: 5' Primer Error: 3' Primer Error: Invalid Cycle: Total Primer Overlaps: 21907 52 0 0 0 21812 0 90734
I am trying to import data in fasta format that is already merged (contigs) and therefore single-ended. e.g.
R8T.7406324 TCACCGCCCATTTCCTGGGGGGGAAGGAATGTTATAGGTTTGGTAGGAGGACGGTTCCATCCTGAGGCGCAGT R8T.5935025 TCACCGCCCATTTCCCATCTCAGCCGCTGCGTTGGGTAGGGGAAAAAGGGGTGCATCCATCCTGAGGCGCAG R8T.5870928 TCACCGCCCATTTCCCATGCACGGGGGGGATGAATACGGACCTCGAGGGAGTGGGTGCATCCTGAGGCGCAGT R8T.5870929 TCACCGCCCATTTCCCATAGAGGGGGAAGGCAGTAGTGAAGCGGCGTGGCCTACAATCCATCCTGAGGCGCAGT R8T.5870927 TCACCTCCCATTTCCGGGGGAGACAAAAGAGTGCTCCTGGGGTTCCGTCCATGGGTCCATCCTGAGGCGCAGT R8T.5870924 TCACCGCCCATTTCCGCGAGGGTCGGCATGATCGCAGCGGGGGAGTAAGGCTCCGTCCATCCTGAGGCGCAGT R8T.5935024 TCACCGCCCATTTCCAACGAACCGCGCTTAAGATTCCTTACCGTTCCAGTATCGCTCCATCCTGAGGCGCAGT R8T.5870923 TCACCGCCCATTTCCACGTCGGAGGGGGACAGGGTTAGATTTAATTAGGGCCAGTCCATCCTGAGGCGCAGT
N.B. I always get some 1-2 bp wobble in the aptamer read length in real data.
The config file is below. As you can see, the reverse primer is always present, but the parsing seems to think there is an error?
Thanks,
Theo
Experiment.name = "SELEX against target Cp" Experiment.description = "round R6 NEG R8T R83" Experiment.projectPath = ./
Experiment.primer5 = TCACCGCCCATTTCC
Experiment.primer3 = TGCGCCTCAGGATGGA
SelectionCycle.name = R6 SelectionCycle.round = 6 SelectionCycle.isControlSelection = False SelectionCycle.isCounterSelection = False
SelectionCycle.name = NEG SelectionCycle.round = 7 SelectionCycle.isControlSelection = False SelectionCycle.isCounterSelection = True
SelectionCycle.name = 8T SelectionCycle.round = 8 SelectionCycle.isControlSelection = False SelectionCycle.isCounterSelection = False
SelectionCycle.name = 83 SelectionCycle.round = 9 SelectionCycle.isControlSelection = False SelectionCycle.isCounterSelection = False
Performance.maxNumberOfCores = 12
AptamerPool.backend = MapDBAptamerPool
SelectionCycle.backend = MapDBSelectionCycle
StructurePool.backend = MapDBStructurePool
AptaplexParser.isPerFile = True
AptaplexParser.forwardFiles = /home/theoa/d/068_aptamer_benchmarking/reads/NEG.fasta AptaplexParser.forwardFiles = /home/theoa/d/068_aptamer_benchmarking/reads/R6.fasta AptaplexParser.forwardFiles = /home/theoa/d/068_aptamer_benchmarking/reads/R83.fasta AptaplexParser.forwardFiles = /home/theoa/d/068_aptamer_benchmarking/reads/R8T.fasta
AptaplexParser.reader = FastqReader
AptaplexParser.PairedEndMaxScoreValue = 55
AptaplexParser.PrimerTolerance = 5
Aptacluster.RandomizedRegionSize = 40
Aptacluster.LSHDimension = 30
Aptatrace.KmerLength = 6
Aptatrace.FilterClusters = True
Aptatrace.OutputClusters = True
Aptatrace.Alpha = 10
Dr Theo Allnutt Bioinformatics Research Fellow School of Medicine, Faculty of Health Waurn Ponds Campus 0352479571
[cid:image001.png@01D34CBC.3A677E20] [cube-small]
Deakin University Locked Bag 20000, Geelong, VIC 3220 +61 3 524 79571 theo.allnutt@deakin.edu.aumailto:theo.allnutt@deakin.edu.au http://www.deakin.edu.au/health/faculty-research/bioinformatics-core-research-group https://bioinformatics-deakin.github.io/portal1/
Deakin University CRICOS Provider Code 00113B
Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus free.
From: drivenbyentropy [mailto:notifications@github.com] Sent: Tuesday, 24 October 2017 11:26 AM To: drivenbyentropy/aptasuite aptasuite@noreply.github.com Cc: bioinformatics bioinformatics@deakin.edu.au; Author author@noreply.github.com Subject: Re: [drivenbyentropy/aptasuite] aptacluster error (#3)
AptaSuite is a re-implementation of a collection of algorithms and not the original implementation which are referenced in the corresponding papers. As such, the issue tracker is not the place to discuss the data accompanying the original manuscripts.
As for the he motif context traces, they are a visual representation of how likely a particular motif is found in a secondary structure context in each of the selection cycles. Here, the contexts H,B,I,M,D, and P correspond to Hairpin, Bulge Loop, Inner Loop, Multiple Loop, Dangling End, and Paired. I have extended the wiki to state the meaning of these abbreviations in the appropriate sections.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/drivenbyentropy/aptasuite/issues/3#issuecomment-338834703, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATvGWSxirTE5OZSwMGYZdaibo1IcGnetks5svS6ugaJpZM4QCTJk.
Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus free.
Also getting this error during parsing:
Initializing parser AptaplexParser Starting AptaPlex: Parsing... Total Reads: Accepted Reads: Contig Assembly Fails: Invalid Alphabet: 5' Primer Error: 3' Primer Error: Invalid Cycle: Total Primer Overlaps: 0 0 0 0 0 0 0 21914 52 0 0 0 21816 0 79072 217 0 0 1 78682 0 138808 406 0 0 5 138116 0 200479 607 0 0 5 199477 0 262090 786 0 0 7 260793 0 309529 963 0 0 7 307978 0 369137 1131 0 0 7 367292 0 437344 1351 0 0 9 435136 0 517846 1581 0 0 10 515249 0 579204 1748 0 0 11 576304 0 Exception in thread "AptaPlex Producer" java.lang.NullPointerException at lib.parser.aptaplex.FastqReader.getNextRead(FastqReader.java:128) at lib.parser.aptaplex.AptaPlexProducer.run(AptaPlexProducer.java:138) at java.lang.Thread.run(Thread.java:745) 592775 1796 0 0 11 589806 0 592775 1796 0 0 11 589806 0 592775 1796 0 0 11 589806 0 ^Cjava.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1249) at java.lang.Thread.join(Thread.java:1323) at lib.parser.aptaplex.AptaPlexParser.parse(AptaPlexParser.java:56) at lib.parser.aptaplex.AptaPlexParser.run(AptaPlexParser.java:84) at java.lang.Thread.run(Thread.java:745)
Dr Theo Allnutt Bioinformatics Research Fellow School of Medicine, Faculty of Health Waurn Ponds Campus 0352479571
[cid:image001.png@01D34CBD.A553D160] [cube-small]
Deakin University Locked Bag 20000, Geelong, VIC 3220 +61 3 524 79571 theo.allnutt@deakin.edu.aumailto:theo.allnutt@deakin.edu.au http://www.deakin.edu.au/health/faculty-research/bioinformatics-core-research-group https://bioinformatics-deakin.github.io/portal1/
Deakin University CRICOS Provider Code 00113B
Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus free.
From: drivenbyentropy [mailto:notifications@github.com] Sent: Tuesday, 24 October 2017 11:26 AM To: drivenbyentropy/aptasuite aptasuite@noreply.github.com Cc: bioinformatics bioinformatics@deakin.edu.au; Author author@noreply.github.com Subject: Re: [drivenbyentropy/aptasuite] aptacluster error (#3)
AptaSuite is a re-implementation of a collection of algorithms and not the original implementation which are referenced in the corresponding papers. As such, the issue tracker is not the place to discuss the data accompanying the original manuscripts.
As for the he motif context traces, they are a visual representation of how likely a particular motif is found in a secondary structure context in each of the selection cycles. Here, the contexts H,B,I,M,D, and P correspond to Hairpin, Bulge Loop, Inner Loop, Multiple Loop, Dangling End, and Paired. I have extended the wiki to state the meaning of these abbreviations in the appropriate sections.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/drivenbyentropy/aptasuite/issues/3#issuecomment-338834703, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATvGWSxirTE5OZSwMGYZdaibo1IcGnetks5svS6ugaJpZM4QCTJk.
Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus free.
Hi,
Based on the file extension of your sequence files, they appear to be in fasta
format. AptaSUITE currently only supports fastq
format as input (I will add fasta
support soon).
If this does not solve the issue, please open another ticket as this problem is unrelated to the original question.
Thanks!
ok I will open another issue. I tried fastq too and same result.
Dr Theo Allnutt Bioinformatics Research Fellow School of Medicine, Faculty of Health Waurn Ponds Campus 0352479571
[cid:image001.png@01D34D7F.AD20F540] [cube-small]
Deakin University Locked Bag 20000, Geelong, VIC 3220 +61 3 524 79571 theo.allnutt@deakin.edu.aumailto:theo.allnutt@deakin.edu.au http://www.deakin.edu.au/health/faculty-research/bioinformatics-core-research-group https://bioinformatics-deakin.github.io/portal1/
Deakin University CRICOS Provider Code 00113B
Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus free.
From: drivenbyentropy [mailto:notifications@github.com] Sent: Wednesday, 25 October 2017 4:29 AM To: drivenbyentropy/aptasuite aptasuite@noreply.github.com Cc: bioinformatics bioinformatics@deakin.edu.au; Author author@noreply.github.com Subject: Re: [drivenbyentropy/aptasuite] aptacluster error (#3)
Hi,
Based on the file extension of your sequence files, they appear to be in fasta format. AptaSUITE currently only supports fastq format as input (I will add fasta support soon).
If this does not solve the issue, please open another ticket as this problem is unrelated to the original question.
Thanks!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/drivenbyentropy/aptasuite/issues/3#issuecomment-339068785, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATvGWWooStYhhe9cG4Gu7lj2C3RuKAl3ks5svh4-gaJpZM4QCTJk.
Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus free.
Hi,
first, I think the example command line in the manual is wrong.. https://github.com/drivenbyentropy/aptasuite/wiki/Basic-Usage
it shows:
but should be?:
Using the following command, I get the error below:
The parsing completes ok then:
My config file:
Log file: