Closed kelly-sovacool closed 3 months ago
I picked bam files for chromosome 22 from an example run (/data/CCBR/projects/techDev/runs/gui/hg38_pair-y_cnv-y_ffpe-y/bams/chrom_split
), and used samtools to convert to fastq then gzipped them. However, XAVIER expects input fastq files to be paired-end, but with this method the reads have already been combined. How can I make faux read pairs from these chr22 fastq files?
Solution: I realized all the headers end in /1 or /2 to designate the forward and reverse reads, so I can split the file into two based on the fastq headers. https://www.biostars.org/p/141256/
Now running into a weird issue with symlinking on biowulf?
dryrun
/data/CCBR/projects/techDev/XAVIER/bin/xavier run --runmode dryrun --input /data/CCBR/projects/techDev/test_xavier/data/fastqs_deinterleaved/*.fastq.gz --output /data/CCBR/projects/techDev/test_xavier/results/hg38_pair-n_cnv-n_ffpe-n --genome hg38 --targets /data/CCBR/projects/techDev/XAVIER/resources/Agilent_SSv7_allExons_hg38.bed
output
xavier
[-] Unloading samtools 1.17 ...
[-] Unloading snakemake 7.19.1
[+] Loading singularity 3.10.5 on cn4270
[+] Loading snakemake 7.19.1
xavier (v1.1)
Traceback (most recent call last):
File "/vf/users/CCBR/projects/techDev/XAVIER/xavier", line 731, in <module>
main()
File "/vf/users/CCBR/projects/techDev/XAVIER/xavier", line 727, in main
args.func(args)
File "/vf/users/CCBR/projects/techDev/XAVIER/xavier", line 96, in run
config = setup(sub_args,
^^^^^^^^^^^^^^^
File "/vf/users/CCBR/projects/techDev/XAVIER/src/run.py", line 174, in setup
ifiles = sym_safe(input_data = links, target = output_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/vf/users/CCBR/projects/techDev/XAVIER/src/run.py", line 97, in sym_safe
os.symlink(os.path.abspath(os.path.realpath(file)), renamed)
FileNotFoundError: [Errno 2] No such file or directory: '/vf/users/CCBR/projects/techDev/test_xavier/data/fastqs_deinterleaved/sample1-normal.chr22.split.R1.fastq.gz' -> '/vf/users/CCBR/projects/techDev/test_xavier/hg38_pair-n_cnv-n_ffpe-n/sample1-normal.chr22.split.R1.fastq.gz'
I also get this error from the GUI.
Solution: need to run init
before dryrun
.
Selected raw reads that mapped to a small region of chromosome 22. Now testing on biowulf.
https://github.com/CCBR/XAVIER/tree/9fcd76bb9474ee76c919c34bf8a5a99925bae864/tests
Regions for test dataset need to have enough coverage to make it through somalier analysis: https://github.com/brentp/somalier/issues/50
Solution: if fewer than e.g. 20 chromosomes, just touch the somalier output file instead of running it.
Currently this test dataset works with paired/cnv off, but fails otherwise. Will need to further refine it to figure out why.
the new subsampled dataset in tests/data/
will fail with --cnv
and on somalier
, but there's now a larger 25% subset available on biowulf that works for these steps: /data/CCBR_Pipeliner/testdata/XAVIER/human_subset
. This should be good enough for our purposes.
Subset to keep all reads that aligned to just one chromosome. Better than random sampling so read depth will still be high.
In progress on branch
tests_iss-27