NAL-i5K / NAL_RNA_seq_annotation_pipeline

Other
5 stars 3 forks source link

try to move on from six.moves #36

Open mpoelchau opened 4 years ago

mpoelchau commented 4 years ago

Do we still need the six.moves library for urlretrieve?

https://github.com/NAL-i5K/NAL_RNA_seq_annotation_pipeline/blob/update-rnannot/rnannot/RNAseq_annotate.py#L12

This might be a moot issue if we figure out a different, more reliable way to download SRA data.

HsiuKangHuang commented 4 years ago

We can use wget, curl and urlretrieve to download sra files from NCBI. However, these ways are not recommended. Downloading files via https is really slow and many sra samples will not work as the compression related reference genome is missing. (http://www.metagenomics.wiki/tools/short-read/ncbi-sra-file-format/wget-download) We may get this kind of error: "name not found while resolving tree within virtual file system module - failed"

The better way to download sra files is using prefetch or fasterq-dump. (Both are included in sratoolkit)

HsiuKangHuang commented 4 years ago

I found the way to slove this problem. Before using prefetch and fasterq-dump, I need to make sure that sratoolkit configuration enabled remote access.(https://ncbi.github.io/sra-tools/install_config.html) Run vdb-config -i and enabled remote access. (Press 'M' to go to main page. Then, press 'E' to enable it. Last, press 's' and 'x' to save and exit configuration page)

mpoelchau commented 4 years ago

It works! I'd just suggest removing the six.moves requirement - https://github.com/NAL-i5K/NAL_RNA_seq_annotation_pipeline/blob/update-rnannot/rnannot/RNAseq_annotate.py#L12