dib-lab / khmer

In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more
http://khmer.readthedocs.io/
Other
757 stars 295 forks source link

khmer version 2.0 ERROR: I/O operation on closed file #1693

Open JifengTang opened 7 years ago

JifengTang commented 7 years ago

I see the issues https://github.com/dib-lab/khmer/issues/1320 , https://github.com/dib-lab/khmer/issues/1341 , but I do not find a solution. I really need to process a dataset quickly.

Anyway, the command I used (see the below) and the process was on a computer with 2T memory. nohup /.. /khmer/khmerEnv/bin/normalize-by-median.py -o RNAseqNormalized.fastq -C 100 -s Kmer.tables -R RNAseq_Report -M 1500000000000 ../RNAseqInput/*fastq >process.out &

Installation: mkdir khmer sudo apt-get install python2.7-dev python-virtualenv python-pip gcc g++ cd khmer/ curl -O https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.11.6.tar.gz tar xzf virtualenv cd virtualenv-; python2.7 virtualenv.py ../khmerEnv; cd .. source khmerEnv/bin/activate pip2 install khmer

(khmerEnv)$ normalize-by-median.py -h

This is the script normalize-by-median.py in khmer. You are running khmer version 2.0 You are also using screed version 1.0
If you use this script in a publication, please cite EACH of the following:
* MR Crusoe et al., 2015. http://dx.doi.org/10.12688/f1000research.6924.1
* CT Brown et al., arXiv:1203.4802 [q-bio.GN]
Please see http://khmer.readthedocs.org/en/latest/citations.html for details.

usage: normalize-by-median.py [-h] [--version] [--ksize KSIZE] [--n_tables N_TABLES] [-U UNIQUE_KMERS] [--fp-rate FP_RATE] [--max-tablesize MAX_TABLESIZE | -M MAX_MEMORY_USAGE] [-q] [-C CUTOFF] [-p] [--force_single] [-u unpaired_reads_filename] [-s filename] [-R report_filename] [--report-frequency report_frequency] [-f] [-o filename] [-l filename] [--gzip | --bzip] input_sequence_filename [input_sequence_filename ...]

Do digital normalization (remove mostly redundant sequences)

positional arguments: input_sequence_filename Input FAST[AQ] sequence filename.

ctb commented 7 years ago

Hi Jifeng, please try:

https://github.com/dib-lab/khmer/archive/master.zip

to get the latest master branch of khmer.

best, --titus

JifengTang commented 7 years ago

Dear Titus,

How should I install it? Because I have version 2.0 installed.

Can I simply copy “normalize-by-median.py” to the folder “khmerEnv/bin/” ?

Thank you very much.

Cheers, Jifeng From: C. Titus Brown [mailto:notifications@github.com] Sent: Tuesday, May 9, 2017 2:22 PM To: dib-lab/khmer khmer@noreply.github.com Cc: Jifeng Tang jifeng.tang@keygene.com; Author author@noreply.github.com Subject: Re: [dib-lab/khmer] khmer version 2.0 ERROR: I/O operation on closed file (#1693)

Hi Jifeng, please try:

https://github.com/dib-lab/khmer/archive/master.zip

to get the latest master branch of khmer.

best, --titus

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/dib-lab/khmer/issues/1693#issuecomment-300146224, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AE2k_jknliO_elSFLfrqBTfNRf5P7b85ks5r4FpOgaJpZM4NVOPL.

Keygene N.V. | P.O. Box 216 | 6700 AE Wageningen | The Netherlands T (+31) 317 46 68 66 | F (+31) 317 42 49 39 | CoC. 09066631 | http://www.keygene.comhttp://www.keygene.com/

[http://www.keygene.com/images/keygenegeneral.jpg]http://www.keygene.com

Stay up-to-date! Subscribe to our bimonthly newsletter herehttp://www.keygene.com/newsletter

[http://www.keygene.com/images/linkedin-grey.png]https://www.linkedin.com/company/KeyGene [http://www.keygene.com/images/twitter-grey.png] https://twitter.com/KeyGeneInfo [http://www.keygene.com/images/facebook-grey.png] https://www.facebook.com/KeyGeneNV

The information contained in this message, and attachments if any, may be privileged and/or confidential and is intended to be received only by persons entitled to receive such information. Use of any part of this message and/or its attachments if any, in any other way than as explicitly stated by the sender is strictly prohibited. Should you receive this message unintentionally please notify the sender immediately, and delete it together with all attachments, if any. Thank you. The transmission of messages and/or information via the Internet is not secured and may be intercepted by third parties. KeyGene assumes no liability for any damage caused by any unintentional disclosure and/or use of the content of this message and attachments if any.

ctb commented 7 years ago

Sorry, I didn't give the complete command!

pip install https://github.com/dib-lab/khmer/archive/master.zip

will work. You cannot just copy normalize-by-median, I'm afraid ;).

JifengTang commented 7 years ago

Dear Titus,

It seems working. Total 78 fastq files, about half are processed.

I used: nohup /data/sag2/2017/JTA_tools/khmerupdate/khmerEnv/bin/normalize-by-median.py -o RNAseqNormalized.fastq -C 100 -s Kmerupdate.tables -R RNAseq_Reportupdate -M 1800000000000 ../RNAseqInput/*fastq >processupdate.out &

For “ –C” option, the default is 20. I changed to 100. Although I am not sure that I should change that.

I want to keep at least 100 coverage per transcript.

Is that “-C” for the whole dataset or per fastq file?

Thank you very much.

Cheers, Jifeng

-C CUTOFF, --cutoff CUTOFF when the median k-mer coverage level is above this number the read is not kept. (default: 20) From: C. Titus Brown [mailto:notifications@github.com] Sent: Tuesday, May 9, 2017 5:05 PM To: dib-lab/khmer khmer@noreply.github.com Cc: Jifeng Tang jifeng.tang@keygene.com; Author author@noreply.github.com Subject: Re: [dib-lab/khmer] khmer version 2.0 ERROR: I/O operation on closed file (#1693)

Sorry, I didn't give the complete command!

pip install https://github.com/dib-lab/khmer/archive/master.zip

will work. You cannot just copy normalize-by-median, I'm afraid ;).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/dib-lab/khmer/issues/1693#issuecomment-300193893, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AE2k_gR4Xy-bn8eO5bNJb0AEeJFv7M4Dks5r4IB_gaJpZM4NVOPL.

Keygene N.V. | P.O. Box 216 | 6700 AE Wageningen | The Netherlands T (+31) 317 46 68 66 | F (+31) 317 42 49 39 | CoC. 09066631 | http://www.keygene.comhttp://www.keygene.com/

[http://www.keygene.com/images/keygenegeneral.jpg]http://www.keygene.com

Stay up-to-date! Subscribe to our bimonthly newsletter herehttp://www.keygene.com/newsletter

[http://www.keygene.com/images/linkedin-grey.png]https://www.linkedin.com/company/KeyGene [http://www.keygene.com/images/twitter-grey.png] https://twitter.com/KeyGeneInfo [http://www.keygene.com/images/facebook-grey.png] https://www.facebook.com/KeyGeneNV

The information contained in this message, and attachments if any, may be privileged and/or confidential and is intended to be received only by persons entitled to receive such information. Use of any part of this message and/or its attachments if any, in any other way than as explicitly stated by the sender is strictly prohibited. Should you receive this message unintentionally please notify the sender immediately, and delete it together with all attachments, if any. Thank you. The transmission of messages and/or information via the Internet is not secured and may be intercepted by third parties. KeyGene assumes no liability for any damage caused by any unintentional disclosure and/or use of the content of this message and attachments if any.

ctb commented 7 years ago

Hi Jifeng,

excellent.

-C is for the whole data set.

best, --titus

On Thu, May 11, 2017 at 05:47:14AM -0700, JifengTang wrote:

Dear Titus,

It seems working. Total 78 fastq files, about half are processed.

I used: nohup /data/sag2/2017/JTA_tools/khmerupdate/khmerEnv/bin/normalize-by-median.py -o RNAseqNormalized.fastq -C 100 -s Kmerupdate.tables -R RNAseq_Reportupdate -M 1800000000000 ../RNAseqInput/*fastq >processupdate.out &

For ??? ???C??? option, the default is 20. I changed to 100. Although I am not sure that I should change that.

I want to keep at least 100 coverage per transcript.

Is that ???-C??? for the whole dataset or per fastq file?

Thank you very much.

Cheers, Jifeng

-C CUTOFF, --cutoff CUTOFF when the median k-mer coverage level is above this number the read is not kept. (default: 20) From: C. Titus Brown [mailto:notifications@github.com] Sent: Tuesday, May 9, 2017 5:05 PM To: dib-lab/khmer khmer@noreply.github.com Cc: Jifeng Tang jifeng.tang@keygene.com; Author author@noreply.github.com Subject: Re: [dib-lab/khmer] khmer version 2.0 ERROR: I/O operation on closed file (#1693)

Sorry, I didn't give the complete command!

pip install https://github.com/dib-lab/khmer/archive/master.zip

will work. You cannot just copy normalize-by-median, I'm afraid ;).

??? You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/dib-lab/khmer/issues/1693#issuecomment-300193893, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AE2k_gR4Xy-bn8eO5bNJb0AEeJFv7M4Dks5r4IB_gaJpZM4NVOPL.

Keygene N.V. | P.O. Box 216 | 6700 AE Wageningen | The Netherlands T (+31) 317 46 68 66 | F (+31) 317 42 49 39 | CoC. 09066631 | http://www.keygene.comhttp://www.keygene.com/

[http://www.keygene.com/images/keygenegeneral.jpg]http://www.keygene.com

Stay up-to-date! Subscribe to our bimonthly newsletter herehttp://www.keygene.com/newsletter

[http://www.keygene.com/images/linkedin-grey.png]https://www.linkedin.com/company/KeyGene [http://www.keygene.com/images/twitter-grey.png] https://twitter.com/KeyGeneInfo [http://www.keygene.com/images/facebook-grey.png] https://www.facebook.com/KeyGeneNV

The information contained in this message, and attachments if any, may be privileged and/or confidential and is intended to be received only by persons entitled to receive such information. Use of any part of this message and/or its attachments if any, in any other way than as explicitly stated by the sender is strictly prohibited. Should you receive this message unintentionally please notify the sender immediately, and delete it together with all attachments, if any. Thank you. The transmission of messages and/or information via the Internet is not secured and may be intercepted by third parties. KeyGene assumes no liability for any damage caused by any unintentional disclosure and/or use of the content of this message and attachments if any.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/dib-lab/khmer/issues/1693#issuecomment-300778222 -- C. Titus Brown, ctbrown@ucdavis.edu