drneavin / Demultiplexing_Doublet_Detecting_Docs

MIT License
10 stars 1 forks source link

Assign_Indiv_by_Geno.R function of Demuxafy.sif #13

Closed ElenaSLuis closed 11 months ago

ElenaSLuis commented 1 year ago

hey!

I'm trying to use Assign_Indiv_by_Geno.R function of Demuxafy.sif from data of freedemuxlet code. It seems the problem is with the FORMAT, but I don't undestand very well...

apptainer exec Demuxafy.sif Assign_Indiv_by_Geno.R -r /home/elenasl/scRNAseq/freemuxlet/vcf_chr/GRCh38_1000G_MAF0.01_GeneFiltered_ChrEncoding_Sorted_polyIC.vcf \

-c /home/elenasl/scRNAseq/freemuxlet/output_polyIC.clust1.vcf.gz -o /home/elenasl/scRNAseq/freemuxlet/ System has not been booted with systemd as init system (PID 1). Can't operate. Failed to connect to bus: Host is down Scanning file to determine attributes. File attributes: meta lines: 242 header_line: 243 variant count: 7139809 column count: 8 Meta line 242 read in. All meta lines processed. gt matrix initialized. Character matrix gt created. Character matrix gt rows: 7139809 Character matrix gt cols: 8 skip: 0 nrows: 7139809 row_num: 0 Processed variant: 7139809 All variants processed Scanning file to determine attributes. File attributes: meta lines: 32 header_line: 33 variant count: 2204998 column count: 11 Meta line 32 read in. All meta lines processed. gt matrix initialized. Character matrix gt created. Character matrix gt rows: 2204998 Character matrix gt cols: 11 skip: 0 nrows: 2204998 row_num: 0 Processed variant: 2204998 All variants processed Found GT genotype format in cluster vcf. Will use that metric for cluster correlation. Detected / separator for GT genotype format in cluster vcf

Error in if (colnames(x@gt)[1] != "FORMAT") { : argument is of length zero Calls: as_tibble -> extract.gt Execution halted

If you could help me..

thank you in advance,

Elena

drneavin commented 1 year ago

Hi Elena,

Thanks for reaching out again about issues that you run into with Demuxafy!

It seems that the Assign_Indiv_by_Geno.R script is having problems parsing the reference vcf that you've provided. To help me identify the problem, could you send me the head of each of the reference file that contains both the header lines and one or two lines of data (ie all the lines with # before them and then a couple that have the snps on them). Based on the information you've provided that would be

head -n 245 /home/elenasl/scRNAseq/freemuxlet/vcf_chr/GRCh38_1000G_MAF0.01_GeneFiltered_ChrEncoding_Sorted_polyIC.vcf

Thanks! -Drew

ElenaSLuis commented 1 year ago

Hey Drew,

I send you all I remove the contigs : ##contig= because there were a lot. But this is the output:

fileformat=VCFv4.1

FILTER=

fileDate=20150218

reference=ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz

source=1000GenomesPhase3Pipeline

FORMAT=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

bcftools_filterVersion=1.9+htslib-1.9

bcftools_filterCommand=filter --include 'AF>=0.01 & AF<=0.99' -O v --output /directflow/SCCGGroupShare/projects/DrewNeavin/References/GRCh38SNPvcfs1000genomes/MergedAutosomesFilteredGenes.recode.MAF0.01.vcf /directflow/SCCGGroupShare/projects/DrewNeavin/References/GRCh38SNPvcfs1000genomes/MergedAutosomesFilteredGenes.recode.vcf; Date=Wed Feb 26 12:30:05 2020

bcftools_viewVersion=1.16+htslib-1.16

bcftools_viewCommand=view -h /home/elenasl/scRNAseq/freemuxlet/vcf_chr/GRCh38_1000G_MAF0.01_GeneFiltered_ChrEncoding.vcf; Date=Tue Jan 3 09:32:22 2023

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

CHROM POS ID REF ALT QUAL FILTER INFO

chr1 58814 rs114420996 G A 100 PASS AC=546;AF=0.109026;AN=5008;NS=2504;DP=24093;EAS_AF=0.0228;AMR_AF=0.1772;AFR_AF=0.1967;EUR_AF=0.0676;SAS_AF=0.0736;AA=g|||;VT=SNP;GRCH37_POS=58814;GRCH37_REF=G;GRCH37_38_REF_STRING_MATCH chr1 59040 rs62637815 T C 100 PASS AC=308;AF=0.0615016;AN=5008;NS=2504;DP=16792;EAS_AF=0.0228;AMR_AF=0.1744;AFR_AF=0.0038;EUR_AF=0.0775;SAS_AF=0.0828;AA=t|||;VT=SNP;GRCH37_POS=59040;GRCH37_REF=T;GRCH37_38_REF_STRING_MATCH

Elena

drneavin commented 1 year ago

Hi Elena,

It looks like there aren't any individuals in your reference vcf so there's no way to correlate between the reference vcf and the clusters. Have you snp genotyped the individuals in pooled in your single cell capture?

ElenaSLuis commented 1 year ago

Hi Drew,

What I did befor was acording the Demuxafi.sif manual..insteas of with singularity it was with apptainer but is the same way, so I send you the pipeline:

[elenasl@cicdna ~]$

sort

/home/elenasl/scRNAseq/freemuxlet/sort_vcf_same_as_bam.sh \ /home/elenasl/scRNAseq/cellranger/run_msc_polyIC_Ranger_count/outs/possorted_genome_bam.bam \ /home/elenasl/scRNAseq/freemuxlet/vcf_chr/GRCh38_1000G_MAF0.01_GeneFiltered_ChrEncoding.vcf \

/home/elenasl/scRNAseq/freemuxlet/vcf_chr/GRCh38_1000G_MAF0.01_GeneFiltered_ChrEncoding_Sorted_polyIC.vcf

filter bam

/home/elenasl/scRNAseq/freemuxlet/filter_bam_file_for_popscle_dsc_pileup.sh \ /home/elenasl/scRNAseq/cellranger/run_msc_polyIC_Ranger_count/outs/possorted_genome_bam.bam \ /home/elenasl/scRNAseq/cellranger/run_msc_polyIC_Ranger_count/outs/filtered_feature_bc_matrix/barcodes.tsv.gz \ /home/elenasl/scRNAseq/freemuxlet/vcf_chr/GRCh38_1000G_MAF0.01_GeneFiltered_ChrEncoding_Sorted_polyIC.vcf \ /home/elenasl/scRNAseq/freemuxlet/data_polyIC/samples_to_demultiplex.filter_bam_file_for_popscle_dsc_pileup.bam > outbam apptainer exec --bind /home/elenasl/scRNAseq:/home/elenasl/ Demuxafy.sif popscle \

apptainer exec Demuxafy.sif popscle dsc-pileup --sam /home/elenasl/scRNAseq/freemuxlet/data_polyIC/samples_to_demultiplex.filter_bam_file_for_popscle_dsc_pileup.bam \ --vcf /home/elenasl/scRNAseq/freemuxlet/vcf_chr/GRCh38_1000G_MAF0.01_GeneFiltered_ChrEncoding_Sorted_polyIC.vcf --out /home/elenasl/scRNAseq/freemuxlet/pileup_polyIC apptainer exec Demuxafy.sif popscle freemuxlet --plp /home/elenasl/scRNAseq/freemuxlet/pileup_polyIC --out /home/elenasl/scRNAseq/freemuxlet/output_polyIC --nsample 2

apptainer exec Demuxafy.sif bash Freemuxlet_summary.sh /home/elenasl/scRNAseq/freemuxlet/output_polyIC.clust1.samples.gz > /home/elenasl/scRNAseq/freemuxlet/freemuxlet_summarypolyIC.tsv

apptainer exec Demuxafy.sif Assign_Indiv_by_Geno.R -r /home/elenasl/scRNAseq/freemuxlet/vcf_chr/GRCh38_1000G_MAF0.01_GeneFiltered_ChrEncoding_Sorted_polyIC.vcf \ -c /home/elenasl/scRNAseq/freemuxlet/output_polyIC.clust1.vcf.gz -o /home/elenasl/scRNAseq/freemuxlet/

Elena

drneavin commented 1 year ago

Hi Elena,

As we saw with your previous post, the apptainer should be fine with singularity, which is great!

The Assign_Indiv_by_Geno is an optional step with Freemuxlet for people who have SNP genotyped each of the individuals in their pool, or have some other measure of the SNPs for each individual (WGS, WES, single cell for each individual separately etc.); Do you have SNP genotype information for the individuals in your pool?

ElenaSLuis commented 1 year ago

no I haven't.. my question maybe is how I can now move this data to an Seurat and see how was be classified each sample (there were 2) of the pool? so in the manual of demuxafy says that with the function: singularity exec Demuxafy.sif Combine_Results.R \ you can obtain results to add to seurat. But you need : -g FREEMUXLET_ASSIGNMENTS] and [-a FREEMUXLET_CORRELATION_LIMIT] so I am little lost about how to do that.

Thank you!

Elena

drneavin commented 1 year ago

Hi Elena, you can do it without the -g and -a flags. Just use the flags for the parameters that you have. Which methods have you run?

I also provide some pseudocode in the 'Next Steps' section to help with adding the results to a Seurat object.

Let me know if that helps resolve your questions :)

ElenaSLuis commented 1 year ago

Hey Drew,

Yes but in this section I need: combined_results_w_combined_assignments.tsv that is the output of Combine_Results.R.

what I do is: [elenasl@cicdna ~]$ apptainer exec Demuxafy.sif Combine_Results.R -o /home/elenasl/scRNAseq/freemuxlet/fremmuxlet_CTL/combineresults_ctl2 -f /home/elenasl/scRNAseq/freemuxlet/fremmuxlet_CTL System has not been booted with systemd as init system (PID 1). Can't operate. Failed to connect to bus: Host is down Reading in freemuxlet results.

Combining results.

Writing output.

Making and writing summary files.

[1] "Freemuxlet_Cluster" No intersection classification method was provided. Output does not contain combined software cell type or donor classifications. Error in names(sanitized_labels) <- non_sanitized_labels : attempt to set an attribute on NULL Calls: upset -> upset_data Execution halted

and in the : /home/elenasl/scRNAseq/freemuxlet/fremmuxlet_CTL I have these files: -rw-rw-r-- 1 elenasl elenasl 64 Jan 11 22:25 freemuxlet_summaryCTL.tsv -rw-rw-r-- 1 elenasl elenasl 735852 Jan 11 22:01 output_CTL.clust1.samples.gz -rw-rw-r-- 1 elenasl elenasl 58574078 Jan 11 22:01 output_CTL.clust1.vcf.gz -rw-rw-r-- 1 elenasl elenasl 311634 Jan 6 21:52 pileup_CTL.cel.gz -rw-rw-r-- 1 elenasl elenasl 169393890 Jan 6 22:04 pileup_CTL.plp.gz -rw-rw-r-- 1 elenasl elenasl 1864188639 Jan 6 22:02 pileup_CTL.umi.gz -rw-rw-r-- 1 elenasl elenasl 67295060 Jan 6 22:04 pileup_CTL.var.gz

I have no idea of the errror so I suppose I'm doing something wrong.

Elena

drneavin commented 1 year ago

Hi Elena,

I may not have accounted for situations in which users provide a single method since the output from each method can be easily added to a seurat object on its own since it already has the Barcodes. I could add functionality to the Combining script to allow users to input just one method so that the formatting is consistent with software combinations. Let me know if you think that would be useful and I can add it to the next release (I'm finalizing a release now so it won't be in 2.0.1 but I could add it to the one after that).

However, you can just use the freemuxlet results directly since it looks like you're not using any other software?

I think the only difference is that you will have all the freemuxlet columns instead of the subset that the Combine script uses.

In this case you would use the freemuxlet.clust1.samples.gz file instead of thecombined_results_w_combined_assignments.tsv file in the pseudocode for Next Steps. You will also have to use the BARCODE column instead of the Barcode column for assigning rownames. So it would look like this:

.libPaths("/usr/local/lib/R/site-library") ### This is required so that R uses the libraries loaded in the image and not any local libraries
library(Seurat)
library(tidyverse)

## Read in the data
counts_matrix <- Read10X(data.dir = "/path/to/10x/matrix/directory/")

## Create a seurat object that contains the counts
seurat <- CreateSeuratObject(counts = counts_matrix, min.cells = 3, min.features = 200)

## Read in the demultiplexing and doublet detecting results
demuxafy <- read.table("/path/to/combined/results/freemuxlet.clust1.samples.gz", sep = "\t", header=TRUE)
rownames(demuxafy) <- demuxafy$BARCODE

## Add the demuxafy data to the Seurat object
seurat <- AddMetaData(seurat, demuxafy)

## Check that the data was correctly added
head(seurat@meta.data)

The most relevant columns are:

Let me know if you run into any issues.

drneavin commented 11 months ago

Closing due to inactivity, please reopen if you still have questions!