Closed paul-shannon closed 1 year ago
Hi @paul-shannon,
The subsetting solution looks great :). I will update the chr22 to use chr22-sub :).
I've checked all the files that used to be fetched from http://igv-data.systemsbiology.net. Please see the list below with each row beginning with the status and followed by the URL of the given file
[NOT FOUND] https://igv-data.systemsbiology.net/static/bamtests/x.bam
[COPIED] https://igv-data.systemsbiology.net/static/testFiles/DNase.bam
[COPIED] https://igv-data.systemsbiology.net/static/testFiles/ndufs2-hg38-simple.bed.gz
[COPIED] https://igv-data.systemsbiology.net/static/testFiles/wgEncodeBroadHistoneGm12878H3k4me3StdSig.bigWig
[COPIED] https://igv-data.systemsbiology.net/static/testFiles/ndufs2-hg38-simple2.bed.gz
[COPIED] https://igv-data.systemsbiology.net/testFiles/GRCh38.94.NDUFS2.gff3
[COPIED] https://igv-data.systemsbiology.net/misc/Homo_sapiens.GRCh38.94.chr.gff3.gz
[COPIED] https://igv-data.systemsbiology.net/misc/Homo_sapiens.GRCh38.94.chr.gff3.gz.tbi
[COPIED] https://igv-data.systemsbiology.net/testFiles/gwas/bellenguez.gwas
[COPIED] https://igv-data.systemsbiology.net/testFiles/gwas/bellenguez.bed
[COPIED] https://igv-data.systemsbiology.net/testFiles/gwas/carolin.gwas
[COPIED] https://igv-data.systemsbiology.net/testFiles/gwas/gwas_sample_tiny.tsv
[COPIED] https://igv-data.systemsbiology.net/testFiles/gwas/tbl.gwas.yeast.chrV.tsv
[COPIED] https://igv-data.systemsbiology.net/tair10/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.fai
[COPIED] https://igv-data.systemsbiology.net/tair10/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa
[COPIED] https://igv-data.systemsbiology.net/tair10/TAIR10_genes.sorted.chrLowered.gff3.gz
[COPIED] https://igv-data.systemsbiology.net/Pfalciparum3D7/PlasmoDB-43_Pfalciparum3D7_Genome.fasta
[COPIED] https://igv-data.systemsbiology.net/Pfalciparum3D7/PlasmoDB-43_Pfalciparum3D7_Genome.fasta.fai
[COPIED] https://igv-data.systemsbiology.net/Pfalciparum3D7/PlasmoDB-43_Pfalciparum3D7.gff
[COPIED] http://igv-data.systemsbiology.net/static/rhos/GCF_000012905.2_ASM1290v2_genomic.fna.fai
[COPIED] https://igv-data.systemsbiology.net/static/tmp/chr19sub.bed
[TOO LARGE -36G] https://igv-data.systemsbiology.net/ampad/NIA-1898/chr7.vcf.gz
[TOO LARGE - 51G] https://igv-data.systemsbiology.net/ampad/NIA-1898/chr2.vcf.gz
As you can see I've managed to copy most of them. I will need your help with:
Once we solve all these issues we should be able to merge: https://github.com/gladkia/igvR/pull/35.
@paul-shannon regarding using subset of VCF for chr22 (ampad/NIA-1898): it's already done [link].
Hi Arek,
x.bam is now at https://igv-data.systemsbiology.net/bamtests/x.bam
Note that “static” subdirectory is no longer in the url.
As for trimming chr7 and chr2
[TOO LARGE -36G] https://igv-data.systemsbiology.net/ampad/NIA-1898/chr7.vcf.gz [TOO LARGE - 51G] https://igv-data.systemsbiology.net/ampad/NIA-1898/chr2.vcf.gz
I need to know the small region of interest in each file. Is that handy for you to find out?
As you can see I've managed to copy most of them. I will need your help with: • two large VCF files. It would be great to prepare the subset the same way as for chr22 • one file with not working link: https://igv-data.systemsbiology.net/static/bamtests/x.bam
Hi Paul,
Hi Arek, x.bam is now at https://igv-data.systemsbiology.net/bamtests/x.bam
Awesome. Fixed.
As for trimming chr7 and chr2 [TOO LARGE -36G] https://igv-data.systemsbiology.net/ampad/NIA-1898/chr7.vcf.gz [TOO LARGE - 51G] https://igv-data.systemsbiology.net/ampad/NIA-1898/chr2.vcf.gz I need to know the small region of interest in each file. Is that handy for you to find out?
For chr7 100,330,000-100,340,000
should suffice (https://github.com/gladkia/igvR/blob/master/inst/demos/vcfDemo.R#L17-L21).
For chr2 maybe 1,099,000-1,104,000
(https://github.com/gladkia/igvR/blob/master/inst/demos/vcfDemo.R#L45-L49)?
Arek
@gladkia - Hi Arek,
I think these new smaller files give you what you asked for:
url size in bytes
https://igv-data.systemsbiology.net/ampad/NIA-1898/chr2-sub.vcf 11182327
https://igv-data.systemsbiology.net/ampad/NIA-1898/chr2-sub.vcf.bgz 2153591
https://igv-data.systemsbiology.net/ampad/NIA-1898/chr2-sub.vcf.bgz.tbi 110
https://igv-data.systemsbiology.net/ampad/NIA-1898/chr7-sub.vcf 16107528
https://igv-data.systemsbiology.net/ampad/NIA-1898/chr7-sub.vcf.bgz 2551332
https://igv-data.systemsbiology.net/ampad/NIA-1898/chr7-sub.vcf.bgz.tbi 226
@gladkia - Hi Arek,
I just filtered the chr22 AMPAD vcf file, creating chr22-sub, at 0.16% of the original size.
I hope that this 15M file is a good fit to your new hosting at gladkia.pl
Sample code below. Do you need more from me along these lines? Glad to provide it if so.
The full chr22 file, though the smallest of the chromosomes, is so large because there are many samples. Here is a some minimal code to display this file: