StevenWingett / FastQ-Screen

Detecting contamination in NGS data and multi-species analysis
https://stevenwingett.github.io/FastQ-Screen/
GNU General Public License v3.0
64 stars 15 forks source link

--get_genomes download failed #47

Closed trfeuerborn closed 2 years ago

trfeuerborn commented 2 years ago

Hi,

I am trying to download the pre-built Bowtie2 indices using fastq_screen --get_genomes

However, I have been receiving the following error: "_Connecting to ftp1.babraham.ac.uk (ftp1.babraham.ac.uk)|149.155.133.2|:80... connected. HTTP request sent, awaiting response... 403 Forbidden 2022-01-11 15:13:35 ERROR 403: Forbidden.

Could not run command 'wget --no-check-certificate -r --no-parent -R 'index.html*' ftp1.babraham.ac.uk/ftpusr46/FastQ_ScreenGenomes/' "

Is this a known problem and is there a solution to it or an alternative way of downloading the dataset?

Thanks for the help

Kind regards,

Tatiana

StevenWingett commented 2 years ago

Hi Tatiana,

Thanks for letting me know; I have the same problem. I'll contact former colleagues at the Babraham Institute to find out what happened to these files.

Many thanks,

Steven

StevenWingett commented 2 years ago

Hi Tatiana,

The regular genomes should now have been re-uploaded to the server. Does the --get_genomes option now work for you?

(I'll sort out the Bismark/Bisulfite genomes in the next few days).

All the best, Steven

trfeuerborn commented 2 years ago

Hi Steven,

Thanks! It worked just fine for me now.

Thanks again for looking into it and re-uploading.

Tatiana

On Tue, 18 Jan 2022 at 17:56, Steven Wingett @.***> wrote:

Hi Tatiana,

The regular genomes should now have been re-uploaded to the server. Does the --get_genomes option now work for you?

(I'll sort out the Bismark/Bisulfite genomes in the next few days).

All the best, Steven

— Reply to this email directly, view it on GitHub https://github.com/StevenWingett/FastQ-Screen/issues/47#issuecomment-1015616099, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANTV43LBCB7OZGUI63JUGPDUWWLSNANCNFSM5LWNONNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

--

Tatiana R. Feuerborn, Ph.D., Ph.D. Postdoctoral Researcher

StevenWingett commented 2 years ago

The bisulfite genomes have now been re-made and uploaded to the FTP.

Further information has been added to the repo to help build the reference genomes if necessary at a future date (see commit af1dc39 ).

blurtime commented 2 years ago

Hi @StevenWingett and first of all thank you for this QC tool. I'm relatively new to the field so please let me know if you have questions or would require more information regarding my problem:

I wanted to use the prebuilt indexes for bowtie2 that are supposed to be available with said command --get_genomes but it always threw a 404 error (normal genomes location). Apparently, the babraham ftp server where they were stored had some hardware-related problems (http://ftp1.babraham.ac.uk) and it doesn't seem to be clear when they're up and running again.

I wanted to use FastQ-Screen to screen for rRNA contamination in my mouse .fq files and now can't download your prebuilt index for this purpose. In the download_genomes/regular_genomes_config_file/fastq_screen.conf file it says that it is an rRNA - In house custom database. Would you mind sharing either the index or the fasta files used to create the index as long as we can't download them the usual way? E.g. via Google drive? Or, alternatively, would you mind telling me how this custom database was constructed? I did some research but didn't manage to find a straightforward way to solve my problem, i.e. a fasta file with rRNA or another prebuilt index like yours. I looked at the silva database and searched for "mus (musculus)" but it seems that this only returns individual rRNA transcripts and not all the rRNA transcripts which may suggest contamination (?).

Would really appreciate your help! Thank you

StevenWingett commented 2 years ago

Hi,

Thank you for bringing this to my attention. I shall try to find out what has happened to the FTP site and retrieve those files.

Have you tried going to: https://www.ncbi.nlm.nih.gov/nuccore

Then search for Mouse rRNA and select the "rRNA" "Molecule Type" filter. This will return many sequences (maybe ignore the predicted sequences). If you now select those sequences and download the FASTA files, you should be able to create a Bowtie2 index.

Does that help?

All the best, Steven

StevenWingett commented 2 years ago

Hi again,

The --get_genomes feature should work once again. Please let me know if it does not work for you now.

All the best, Steven

blurtime commented 2 years ago

Hello @StevenWingett,

I didn't have a chance yet to check out your first suggestion but I guess you were faster than me. I just started the donwload. Thanks a lot for your help, I appreciate it!

All the best :)