RobertsLab / resources

https://robertslab.github.io/resources/
19 stars 11 forks source link

Trimming the reads from PSMFC byssus project #1767

Closed graceleuchtenberger closed 10 months ago

graceleuchtenberger commented 10 months ago

Hi Matt, I was looking at your original code for the PSMFC byssus project and was wondering why specifically you cut three different adapter sequences? Did the UT sequencing facility provide guidance on what to trim? Thanks!


# run cutadapt on each file
/home/shared/8TB_HDD_02/mattgeorgephd/.local/bin/cutadapt $F -a A{8} -a G{8} -a AGATCGG -u 15 -m 20 -o \
/home/shared/8TB_HDD_02/graceleuchtenberger/PSMFC-mytilus-byssus-pilot/trim-fastq/$results_file
done
sr320 commented 10 months ago

related: what is taq-seq library protocol? Is there an insert size or just sequencing in from 3' end?

sr320 commented 10 months ago

my take on what code is doing; needs confirmation

cutadapt $F:
cutadapt is the tool being used.
$F is a placeholder for a file name. In a shell script, $F would be replaced by the value of the variable F, which should contain the name of the input file (usually a FASTQ file containing sequencing reads).
-a A{8}:
-a specifies the adapter sequence that needs to be trimmed from the reads.
A{8} indicates an adapter composed of 8 consecutive adenine (A) nucleotides.
-a G{8}:
Another adapter sequence to be trimmed.
G{8} indicates an adapter composed of 8 consecutive guanine (G) nucleotides.
-a AGATCGG:
Specifies another adapter sequence.
AGATCGG is a specific sequence of nucleotides.
-u 15:
-u indicates that bases should be removed from the beginning of each read.
15 means that the first 15 bases from the start of each read will be removed.
-m 20:
-m sets the minimum length of reads to keep after trimming.
20 means that any reads shorter than 20 bases after trimming will be discarded.
-o:
-o specifies the output file where the trimmed reads will be saved.
What follows -o should be the name of the output file, but it seems to be missing in the command you provided.
kubu4 commented 10 months ago

-o should be the name of the output file, but it seems to be missing in the command you provided.

I think it's there. The output file name is stored in n the variable, $results_file,which is at the end of that path after the -o option.

mattgeorgephd commented 10 months ago

UT Austin uses the QuantSeq 3' mRNA protocol. Here is the Library Prep kit guide. I originally used their suggestions.

However! Looking through UT Austin's website, I discovered that they trimmed the files for us! Follow these steps:

  1. The job number for the PSMFC-mytilus-byssus-pilot was JA22078

  2. You can use this job number to login to the UT Austin tag-seq database using this link:

https://gsafjobs.icmb.utexas.edu/tagseq-data/

  1. You can get to them by clicking this menu option:

image

  1. All of the trimmed files are listed and can be downloaded as fastq.gz:

image

sr320 commented 10 months ago

@mattgeorgephd how did you bulk download from this site... seem like wget is prohibited.

--2023-12-13 06:44:48--  https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/
Resolving gsafjobs.icmb.utexas.edu (gsafjobs.icmb.utexas.edu)... 146.6.213.4
Connecting to gsafjobs.icmb.utexas.edu (gsafjobs.icmb.utexas.edu)|146.6.213.4|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-12-13 06:44:49 ERROR 403: Forbidden.
mattgeorgephd commented 10 months ago

@sr320 I didn't. GSAF sent me an illumina basespace link with all of the untrimmed .fastq.gz files. It wasn't clear that they had provided trimmed versions until I logged into their job tracker. We might have to request a basespace link for bulk download.

graceleuchtenberger commented 10 months ago

That's great that there's trimmed files available! Can I request the link through your account or do you need to email them?

kubu4 commented 10 months ago

@sr320 - Bulk download via wget can be done like this:

wget -e robots=off -np --input-file=download.txt

Where download.txt contains:

https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A1-T001F_S100_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A1-T030F_S196_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A10-T015F_S172_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A11-T027F_S180_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A12-T021G_S188_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A2-T009F_S108_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A2-T039F_S204_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A3-T005G_S116_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A3-T035G_S212_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A4-T002FX_S220_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A4-T126F_S124_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A5-T010FX_S228_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A5-T134F_S132_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A6-T130G_S140_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A7-T046F_S148_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A8-T110F_S156_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/A9-T055G_S164_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B1-T002F_S101_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B1-T031F_S197_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B10-T016F_S173_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B11-T118F_S181_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B12-T023G_S189_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B2-T010F_S109_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B2-T040F_S205_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B3-T006G_S117_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B3-T036G_S213_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B4-T003FX_S221_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B4-T127F_S125_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B5-T011FX_S229_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B5-T135F_S133_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B6-T131G_S141_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B7-T049F_S149_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B8-T111F_S157_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/B9-T056G_S165_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C1-T003F_S102_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C1-T033F_S198_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C10-T017F_S174_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C11-T119F_S182_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C12-T025G_S190_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C2-T011F_S110_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C2-T041F_S206_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C3-T007G_S118_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C3-T037G_S214_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C4-T004FX_S222_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C4-T128F_S126_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C5-T012FX_S230_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C5-T136F_S134_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C6-T132G_S142_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C7-T051F_S150_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C8-T112F_S158_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/C9-T057G_S166_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D1-T004F_S103_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D1-T034F_S199_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D10-T019F_S175_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D11-T014G_S183_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D12-T026G_S191_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D2-T012F_S111_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D2-T029G_S207_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D3-T008G_S119_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D3-T038G_S215_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D4-T005FX_S223_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D4-T129F_S127_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D5-T137F_S135_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D6-T133G_S143_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D7-T052F_S151_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D8-T046G_S159_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/D9-T058G_S167_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E1-T005F_S104_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E1-T035F_S200_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E10-T021F_S176_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E11-T015G_S184_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E12-T027G_S192_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E2-T001G_S112_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E2-T030G_S208_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E3-T009G_S120_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E3-T039G_S216_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E4-T006FX_S224_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E4-T130F_S128_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E5-T126G_S136_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E6-T134G_S144_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E7-T055F_S152_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E8-T047G_S160_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/E9-T110G_S168_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F1-T006F_S105_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F1-T036F_S201_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F10-T023F_S177_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F11-T016G_S185_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F12-T118G_S193_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F2-T002G_S113_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F2-T031G_S209_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F3-T010G_S121_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F3-T040G_S217_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F4-T007FX_S225_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F4-T131F_S129_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F5-T127G_S137_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F6-T135G_S145_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F7-T056F_S153_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F8-T049G_S161_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/F9-T111G_S169_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G1-T007F_S106_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G1-T037F_S202_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G10-T025F_S178_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G11-T017G_S186_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G12-T119G_S194_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G2-T003G_S114_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G2-T033G_S210_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G3-T011G_S122_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G3-T041G_S218_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G4-T008FX_S226_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G4-T132F_S130_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G5-T128G_S138_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G6-T136G_S146_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G7-T057F_S154_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G8-T051G_S162_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/G9-T112G_S170_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H1-T008F_S107_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H1-T038F_S203_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H10-T026F_S179_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H11-T019G_S187_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H12-T029F_S195_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H2-T004G_S115_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H2-T034G_S211_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H3-T001FX_S219_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H3-T012G_S123_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H4-T009FX_S227_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H4-T133F_S131_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H5-T129G_S139_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H6-T137G_S147_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H7-T058F_S155_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H8-T052G_S163_L099_R1_cmb.trim.fastq.gz
https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/H9-T014F_S171_L099_R1_cmb.trim.fastq.gz
kubu4 commented 10 months ago

For reference, this was how I created that file:

  1. Highlight and copy that table on that page listing all of the trimmed files.
  2. Paste into a file. Looks like:

Trimmed FastQ   FastQC Report
A1-T001F_S100_L099_R1_cmb.trim.fastq.gz A1-T001F_S100_L099_R1_cmb.trim_fastqc.html
A1-T030F_S196_L099_R1_cmb.trim.fastq.gz A1-T030F_S196_L099_R1_cmb.trim_fastqc.html
A10-T015F_S172_L099_R1_cmb.trim.fastq.gz    A10-T015F_S172_L099_R1_cmb.trim_fastqc.html
A11-T027F_S180_L099_R1_cmb.trim.fastq.gz    A11-T027F_S180_L099_R1_cmb.trim_fastqc.html
A12-T021G_S188_L099_R1_cmb.trim.fastq.gz    A12-T021G_S188_L099_R1_cmb.trim_fastqc.html
A2-T009F_S108_L099_R1_cmb.trim.fastq.gz A2-T009F_S108_L099_R1_cmb.trim_fastqc.html
A2-T039F_S204_L099_R1_cmb.trim.fastq.gz A2-T039F_S204_L099_R1_cmb.trim_fastqc.html
A3-T005G_S116_L099_R1_cmb.trim.fastq.gz A3-T005G_S116_L099_R1_cmb.trim_fastqc.html
  1. Remove header from file and get just list of FastQs:

awk 'NR > 2 {print $1}' wget.txt > download.txt

  1. Copy base URL from one of the individual files and store in variable:

url=https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/

  1. Append URL to beginning of each filename (have to use % ad delimiter because slashes exist in our URL):

sed': sed -i "s%^%${url}%" download.txt

EDITED: Added note to 3 about FastQs.

sr320 commented 10 months ago

@kubu4 so that works for you?

this code did not work for me, and not seeing fundamentally different.

wget -r \
--no-directories --no-parent \
-P ../data \
-A .fastq.gz https://https://gsafjobs.icmb.utexas.edu/tagseq_prep/JA22078_SA22060/fq.trim/ \
--no-check-certificate
kubu4 commented 10 months ago

Yes, the code I posted works. You should use that. I also tried the -A option and got the 403 forbidden error. Providing a list of files works, though.

graceleuchtenberger commented 10 months ago

It's working on mine Sam, thanks!

kubu4 commented 10 months ago

The primary difference is that using the -A option requires (I think) that the server will provide a directory index to wget. If they've configured the server to not provide this, then it won't work because wget can't browse the index for matching file types.

However, the list of files method are direct links to the individual files, so wget doesn't need the directory index to figure out what to download.

mattgeorgephd commented 10 months ago

Wahoo! Thanks @kubu4.