Error : cannot allocate vector of size 3.2 Gb

lyy005 commented 9 years ago

Hi Jim PrimerTree is a very handy tool! Recently when I ran the package, I got this error. I attached the output here. Do you know if it's because there's too many blast hits?

Primer = search_primer_pair(name='Primer', 'CGAGAAGACCCTATGGAGCTTA', 'AATCGTTGAACAAACGAACC', num_aligns = 50000) BLASTing 1 primer combinations Submitting Primer-BLAST query BLAST alignment processing, refreshing in 20 seconds... BLAST alignment processing, refreshing in 20 seconds... BLAST alignment processing, refreshing in 20 seconds... BLAST alignment processing, refreshing in 20 seconds... BLAST alignment processing, refreshing in 20 seconds... BLAST alignment completed in 123 seconds 41599 BLAST alignments parsed in 539 seconds taxonomy retrieved in 267 seconds 41599 sequences retrieved from NCBI in 7111 seconds, product length min:228 mean:339.02 max:658 41599 sequences aligned in 58704 seconds length:761 pairwise DNA distances calculated in 103 seconds Error : cannot allocate vector of size 3.2 Gb

jimhester commented 9 years ago

I just updated the source code for the read_dna C functions, which should hopefully fix this error in 2c0e129b93862fdc2bd9abc6bc13eafe752e3321.

Please install that version from github using

devtools::install_github("jimhester/primerTree")

Then let me know if you still run into the error. I verified that it works on my machine using the primers above with num_aligns = 500. If you are still running into an error I would try lowering that parameter from 50000, that many alignments may be too big to produce the multiple alignment in reasonable time and space.

lyy005 commented 9 years ago

I really appreciate your help on this. The reason why I'm using num_aligns = 50000 is because I want to build a reference database for fish species. But there's tons of duplicated mammal sequences. So I have to save more BLAST hits and remove mammal sequences afterwards. Do you know how can I run primer searching on a subgroup of NR database?

Here's what I got from my primer set, more than 95% of the sequences are from mammal.
21132 "Mammalia" 954 "Actinopteri" 26 "Amphibia" 19 "Chondrichthyes" 3 "Cladistia" 83 NA

jimhester commented 9 years ago

Sure, use a CUSTOM_DB parameter with the GI numbers you want to search. You should be able to get all the GIs you need from a taxonomy search like http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=7777&lvl=3&lin=f&keep=1&srchmode=1&unlock

On Mon, Jan 19, 2015 at 3:15 PM, lyy005 notifications@github.com wrote:

I really appreciate your help on this. The reason why I'm using num_aligns = 50000 is because I want to build a reference database for fish species. But there's tons of duplicated mammal sequences. So I have to save more BLAST hits and remove mammal sequences afterwards. Do you know how can I run primer searching on a subgroup of NR database?

Here's what I got from my primer set, more than 95% of the sequences are from mammal.

21132 "Mammalia" 954 "Actinopteri" 26 "Amphibia" 19 "Chondrichthyes" 3 "Cladistia" 83 NA

— Reply to this email directly or view it on GitHub https://github.com/jimhester/primerTree/issues/10#issuecomment-70553677.

lyy005 commented 9 years ago

Jim I've found all the GIs I need for primer searching. I'm not familiar with R. Is CUSTOM_DB a argument of primerTREE? Would you show me how can I feed these GI numbers to primerTREE?

Thank you

lyy005 commented 9 years ago

Hi I tried the custom_db parameter in search_primer_pair and it said this parameter wasn't found. Would you give me any advice on this?

test = search_primer_pair(name='test', 'GCCCCTCAGAATGATATTTGTCCTCA', 'AAAAACCACCGTTGTTATTCAACTA', num_aligns = 50, custom_db = gis)
BLASTing 1 primer combinations name type defval 1 SEQFILE file 2 PRIMER5_START text 3 PRIMER5_END text 4 PRIMER3_START text 5 PRIMER3_END text 6 PRIMER_LEFT_INPUT text 7 PRIMER_RIGHT_INPUT text 8 PRIMER_PRODUCT_MIN text 70 9 PRIMER_PRODUCT_MAX text 1000 10 PRIMER_NUM_RETURN text 10 11 PRIMER_MIN_TM text 57.0 12 PRIMER_OPT_TM text 60.0 13 PRIMER_MAX_TM text 63.0 14 PRIMER_MAX_DIFF_TM text 3 15 PRIMER_ON_SPLICE_SITE dropdown 0 16 SPLICE_SITE_OVERLAP_5END text 7 17 SPLICE_SITE_OVERLAP_3END text 4 18 SPAN_INTRON checkbox
19 MIN_INTRON_SIZE text 1000 20 MAX_INTRON_SIZE text 1000000 21 SEARCH_SPECIFIC_PRIMER checkbox on 22 SEARCHMODE dropdown 0 23 PRIMER_SPECIFICITY_DATABASE dropdown refseq_mrna 24 CUSTOMSEQFILE file 25 ORGANISM text Homo sapiens 29 AddOrg button 30 EXCLUDE_XM checkbox
31 EXCLUDE_ENV checkbox
32 ENTREZ_QUERY text 33 TOTAL_PRIMER_SPECIFICITY_MISMATCH dropdown 1 34 PRIMER_3END_SPECIFICITY_MISMATCH dropdown 1 35 MISMATCH_REGION_LENGTH dropdown 5 36 TOTAL_MISMATCH_IGNORE dropdown 6 37 PRODUCT_SIZE_DEVIATION dropdown 4000 38 ALLOW_TRANSCRIPT_VARIANTS checkbox
39 NEWWIN checkbox
40 SHOW_SVIEWER checkbox on 41 HITSIZE dropdown 50000 43 EVALUE dropdown 30000 44 WORD_SIZE dropdown 7 45 MAX_CANDIDATE_PRIMER dropdown 500 46 NUM_TARGETS text 20 47 NUM_TARGETS_WITH_PRIMERS text 1000 48 MAX_TARGET_PER_TEMPLATE text 100 49 PRODUCT_MIN_TM text
50 PRODUCT_OPT_TM text
51 PRODUCT_MAX_TM text
52 PRIMER_MIN_SIZE text 15 53 PRIMER_OPT_SIZE text 20 54 PRIMER_MAX_SIZE text 25 55 PRIMER_MIN_GC text 20.0 56 PRIMER_MAX_GC text 80.0 57 GC_CLAMP text 0 58 POLYX text 5 59 PRIMER_MAX_END_STABILITY text 9 60 PRIMER_MAX_END_GC text 5 61 TH_OLOGO_ALIGNMENT checkbox
62 TH_TEMPLATE_ALIGNMENT checkbox
63 PRIMER_MAX_TEMPLATE_MISPRIMING_TH text 40.00 64 PRIMER_PAIR_MAX_TEMPLATE_MISPRIMING_TH text 70.00 65 PRIMER_MAX_SELF_ANY_TH text 45.0 66 PRIMER_MAX_SELF_END_TH text 35.0 67 PRIMER_PAIR_MAX_COMPL_ANY_TH text 45.0 68 PRIMER_PAIR_MAX_COMPL_END_TH text 35.0 69 PRIMER_MAX_HAIRPIN_TH text 24.0 70 PRIMER_MAX_TEMPLATE_MISPRIMING text 12.00 71 PRIMER_PAIR_MAX_TEMPLATE_MISPRIMING text 24.00 72 SELF_ANY text 8.00 73 SELF_END text 3.00 74 PRIMER_PAIR_MAX_COMPL_ANY text 8.00 75 PRIMER_PAIR_MAX_COMPL_END text 3.00 76 EXCLUDED_REGIONS text 77 OVERLAP text 78 OVERLAP_5END text 7 79 OVERLAP_3END text 4 80 MONO_CATIONS text 50.0 81 DIVA_CATIONS text 1.5 82 CON_DNTPS text 0.6 83 SALT_FORMULAR dropdown 1 84 TM_METHOD dropdown 1 85 CON_ANEAL_OLIGO text 50.0 86 NO_SNP checkbox
87 PRIMER_MISPRIMING_LIBRARY dropdown AUTO 88 LOW_COMPLEXITY_FILTER checkbox on 89 PICK_HYB_PROBE checkbox
90 PRIMER_INTERNAL_OLIGO_MIN_SIZE text 18 91 PRIMER_INTERNAL_OLIGO_OPT_SIZE text 20 92 PRIMER_INTERNAL_OLIGO_MAX_SIZE text 27 93 PRIMER_INTERNAL_OLIGO_MIN_TM text 57.0 94 PRIMER_INTERNAL_OLIGO_OPT_TM text 60.0 95 PRIMER_INTERNAL_OLIGO_MAX_TM text 63.0 96 PRIMER_INTERNAL_OLIGO_MIN_GC text 20.0 97 PRIMER_INTERNAL_OLIGO_OPT_GC_PERCENT text 50 98 PRIMER_INTERNAL_OLIGO_MAX_GC text 80.0 99 NEWWIN checkbox
100 SHOW_SVIEWER checkbox on

Error : CUSTOM_DB not valid option

jimhester commented 9 years ago

I think the issue is custom_db is only an option when you select custom from the dropdown on http://www.ncbi.nlm.nih.gov/tools/primer-blast/. You may be better off running the query manually then parsing the results with PrimerTree, although you will probably have to do some programming yourself to get this working.

Sorry I don't have a better response for you with this, I agree this seems like a useful feature.

MVesuviusC / primerTree

Error : cannot allocate vector of size 3.2 Gb #10