drneavin / Demultiplexing_Doublet_Detecting_Docs

MIT License
14 stars 1 forks source link

Issues with Assign_Indiv_by_Geno.R script #35

Closed MrLocuace closed 9 months ago

MrLocuace commented 9 months ago

Hi @drneavin, I am running this test script:

apptainer exec --bind /project/PI/USERS/me/czi/data/scrna/ Demuxafy.sif Assign_Indiv_by_Geno.R -r /project/PI/USERS/me/czi/data/scrna/fastqs/skip_remap_OUTS/LB-CC-20s-01-NI-GEX-b1-FC02_souporcell/batch1.MAF0.05.vcf.gz -c /project/PI/USERS/me/czi/data/scrna/fastqs/skip_remap_OUTS/LB-CC-20s-01-NI-GEX-b1-FC02_souporcell/cluster_genotypes.vcf -o /project/PI/USERS/me/czi/data/scrna/fastqs/skip_remap_OUTS/LB-CC-20s-01-NI-GEX-b1-FC02_souporcell

I get:

System has not been booted with systemd as init system (PID 1). Can't operate. Failed to connect to bus: Host is down Scanning file to determine attributes. File attributes: meta lines: 33 header_line: 34 variant count: 3804937 column count: 35 Meta line 33 read in. All meta lines processed. gt matrix initialized. Character matrix gt created. Character matrix gt rows: 3804937 Character matrix gt cols: 35 skip: 0 nrows: 3804937 row_num: 0 Processed variant: 3804937 All variants processed Scanning file to determine attributes. File attributes: meta lines: 40 header_line: 41 variant count: 130285 column count: 12 Meta line 40 read in. All meta lines processed. gt matrix initialized. Character matrix gt created. Character matrix gt rows: 130285 Character matrix gt cols: 12 skip: 0 nrows: 130285 row_num: 0 Processed variant: 130285 All variants processed Found GT genotype format in cluster vcf. Will use that metric for cluster correlation. Detected / separator for GT genotype format in cluster vcf Found GT genotype format in reference vcf. Will use that metric for cluster correlation. Detected / separator for GT genotype format in reference vcf Found REF and ALT in both cluster and reference genotype vcfs. Will use chromosome, position, REF and ALT to match SNPs. Joining, by = "ID" Joining, by = "ID" Joining, by = "ID" [1] "AYM-4-071" [1] "0" Error in cor(as.numeric(pull(ref_df, col)), as.numeric(pull(clust_df, : no complete element pairs Calls: pearson_correlation -> cor Execution halted

The reference vcf file's header is:

fileformat=VCFv4.2

FILTER=

fileDate=20231206

source=PLINKv1.90

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

INFO=

FORMAT=

INFO=

INFO=

bcftools_viewVersion=1.15.1-21-gf2694e5+htslib-1.15.1-40-g226c1a8

bcftools_viewCommand=view --types snps -q 0.05:minor -m2 -M2 -Oz -o peru139.chile284.MAF0.05.vcf.gz peru139.chile284.vcf; Date=Wed Dec 6 14:01:01 2023

bcftools_viewCommand=view -s AYM-4-071,CHI-0-060-2,MAP-9-019,QUE-4-038,PER025,PER116,AYM-4-078,CHI-0-059,MAP-9-020,QUE-4-042,PER031,PER110,AYM-4-073,CHI-0-058,MAP-9-017,QUE-4-047,PER030,AYM-4-077,CHI-0-061,MAP-9-016,PER112,PER115,CHI-0-063,PER092,PER088,PER106 -Oz -o /project/PI/USERS/me/czi/data/scrna/fastqs/skip_remap_OUTS/LB-CC-20s-01-NI-GEX-b1-FC02_souporcell/batch1.MAF0.05.vcf.gz peru139.chile284.MAF0.05.vcf.gz; Date=Sun Jan 14 18:02:26 2024

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AYM-4-071 CHI-0-060-2 MAP-9-019 QUE-4-038 PER025 PER116 AYM-4-078 CHI-0-059 MAP-9-020 QUE-4-042 PER031 PER110 AYM-4-073 CHI-0-058 MAP-9-017 QUE-4-047 PER030 AYM-4-077 CHI-0-061 MAP-9-016 PER112 PER115 CHI-0-063 PER092 PER088 PER106

The header of cluster_genotypes.vcf is:

fileformat=VCFv4.3

fileDate=05122018_15h52m43s

source=IGSRpipeline

reference=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

FILTER=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 0 1 2

chr1 631712 . T C . PASS AF=0.02;AC=88;NS=2548;AN=5096;EAS_AF=0.01;EUR_AF=0.0;AFR_AF=0.04;AMR_AF=0.03;SAS_AF=0.01;VT=SNP;DP=17330 GT:AO:RO:T:E:GO:GN 1/1:1:0:-2:-6:-3,-1,-2:-2,0,-1 0/0:0:3:-2:-6:-1,-3,-2:0,-2,-1 ./.:0:0:-2:-6:-1,-1,-1:-1,-1,-1

Honestly, I don't have a clue of what the problem is. Please help !

drneavin commented 9 months ago

Hi @MrLocuace , it looks like you have chr encoding in one file (chr1, chr2...) but not the other (chromosomes coded as 1, 2...). I would recommend updating one of them so they have the same chr encoding so that they can be directly compared.

MrLocuace commented 9 months ago

Thank you ! That solved the issue

drneavin commented 9 months ago

Glad it was a fast fix!