RobertsLab / resources

https://robertslab.github.io/resources/
20 stars 11 forks source link

Perform Copy Number Variation analysis on C gigas samples #1656

Open sr320 opened 1 year ago

sr320 commented 1 year ago

for @ggoetznoaa / @mgavery can advise

TLDR: Nightingales Folder - F05/F14 prefix <- raw data

Short-read WGS, 30x coverage, 32 samples, 2 families, 8 samples per ploidy/family. See notebook post for more information.

WGS results received from Azenta. Stored on nightingales. Intial QC and trimming done on Raven. See notebook post.


Pertinent documents

repo

Manuscript Proposal Tissue Sample List https://github.com/RobertsLab/resources/issues/1304 DNA extraction protocol DNA extraction results Gannet Folder Nightingales Folder - F05/F14 prefix <- raw data

ggoetznoaa commented 1 year ago

@sr320 I see the files but I can't download them. It just sorta hangs when I try either via the web browser or via command line program. Do I need a login/password?

ggoetznoaa commented 1 year ago

@sr320 oh and I never got an email saying you tagged me. Not sure if its something on my end that needs to be changed or your end. I usually don't use github for this sort of stuff.

kubu4 commented 1 year ago

Hmmm, sorry you're having trouble! All the files are publicly accessible, so no need for a password.

I just tested downloading a file and proceeded without issue:

wget https://owl.fish.washington.edu/nightingales/C_gigas/F142n01_R1_001.fastq.gz

Obviously, this isn't too much help, since it doesn't solve your problem...

ggoetznoaa commented 1 year ago

it could be something on NOAA's side, firewall setting etc. I was able to get your command to run but when I tried the following it failed.

wget http://owl.fish.washington.edu/nightingales/C_gigas/0501_R1_001.fastq.gz

ggoetznoaa commented 1 year ago

Ah I think I see the issue. The link in the doc doesn't have https and firefox went with http and that doesn't seem to be working.

ggoetznoaa commented 1 year ago

yep its working, as soon as I added https.

ggoetznoaa commented 1 year ago

Ok I've downloaded the files. I'm assuming the files I've downloaded are the raw files and not the trimmed/QC'd files Matt mentions in his lab doc. I see all 32 samples (16 F0 and 16 F14).

sr320 commented 1 year ago

Correct

sr320 commented 1 year ago

@ggoetznoaa just checking in to see where this is at?

mgavery commented 1 year ago

It's almost ready to move to my plate! Giles has done the mapping and pulled coverage for single copy genes, mito and ribo using bedtools. He is going to make tables that consolidate coverage for all samples into a single table and then I'll analyze in R.

sr320 commented 1 year ago

Can I get access to bam files?

ggoetznoaa commented 1 year ago

@mgavery yep, just finished making the three different table files.

@sr320 I still have the BAM files, there are 32 files totaling 172 GB. I can put them up in a google drive for you to download them from unless you have another place you want me to put them.

sr320 commented 1 year ago

@ggoetznoaa can you put on mox scrubbed directory?

ggoetznoaa commented 1 year ago

@sr320 I have no idea what that is or where that is. Is it a server you have? if so I don't think I have access to it.

sr320 commented 1 year ago

mox - hyak..... mox.hyak.uw.edu ; you have account.

other suggestion is fine but google docs?? cannot do anything with files that big in google docs.

ggoetznoaa commented 1 year ago

I don't have access to hyak anymore or at least I haven't tried to access it in a while. I can take a look though.

As for Google, I basically would put the files up in a folder on Google Drive (not Docs). I would then send a link to you for that folder and then you can just download the files. But that would require you to have the hard drive space on your computer and a decent internet connection. Currently I use a command line tool called rclone to move large files up and down from Google Drive.

I'm open to other ideas if you got any.

G.

ggoetznoaa commented 1 year ago

Yep, I don't have access to Hyak anymore. I just followed the instructions here.

https://wiki.cac.washington.edu/display/hyakusers/Logging+In

And I'm not seeing the option to activate Hyak.

image

I would need someone to sponsor an account for me.

mgavery commented 1 year ago

@sr320 I have an acct on gannet. I can put them there. Does that work?

sr320 commented 1 year ago

@mgavery yes that would be great. thanks!

ggoetznoaa commented 1 year ago

@mgavery the sorted/indexed BAM files are located in

/share/nwfsc/ggoetz/202306-c.gigas-cnv/bowtie2/cgigas_ref

On Sedna

mgavery commented 1 year ago

@sr320 bams are here: /var/services/homes/charlie/Cgigas_WGS_bams

mgavery commented 1 year ago

Cursor_and_Plot_Zoom

mgavery commented 1 year ago

First pass at looking at coverage per sample for ~5000 single copy genes (top left) and then average copy number of mito and ribo genes

mgavery commented 1 year ago

This is a plot to see individual variation. Color indicates family and number on each bar indicates ploidy.

Plot_Zoom

mattgeorgephd commented 1 year ago

Amazing the variation within family/ploidy. Some oysters have x3!

mgavery commented 1 year ago

Dip_v_trip_mito_ribo_copynum.csv

This is the file the plots were generated from. I haven't done any stats yet

sr320 commented 1 year ago

Here is what I have for global analysis... https://rpubs.com/sr320/1070681