Kallisto Bus doesn't give output

rbarbieri86 commented 4 weeks ago

Hello,

I am trying to have scUTRquant analyze a small scRNA-seq dataset (3 controls, 3 alternate). I have written the config.yaml and the sample.csv and placed them in a different folder than the scUTRquant one. The files should be correctly pointed at with a full path, however I get this error:

snakemake --use-conda --cores 8 --configfile ~/athero_scRNA/config.yaml
Config file config.yaml is extended by additional config specified via the command line.
[INFO] scUTRquant v0.5.0
[INFO] Loading sample data...
[INFO] Loaded 2 samples.
Assuming unrestricted shared filesystem usage.
Building DAG of jobs...
MissingInputException in rule mtxs_to_sce_genes in file /q/home/barbieri/scUTRquant/Snakefile, line 315:
Missing input files for rule mtxs_to_sce_genes:
    output: data/sce/utrome_mm10_v2/Athero_Clement.genes.Rds
    wildcards: target=utrome_mm10_v2
    affected files:
        data/kallisto/utrome_mm10_v2/A9/genes.barcodes.txt
        data/kallisto/utrome_mm10_v2/A10/genes.mtx
        data/kallisto/utrome_mm10_v2/A9/genes.mtx
        data/kallisto/utrome_mm10_v2/A10/genes.barcodes.txt
        data/kallisto/utrome_mm10_v2/A10/genes.genes.txt
        data/kallisto/utrome_mm10_v2/A9/genes.genes.txt

So it looks like kallisto bus is not providing the expected output. For reference here is the content of the config.yaml I am using:

dataset_name: "Athero_Clement"
sample_file: "/q/home/barbieri/athero_scRNA/Athero_Clement.csv"
sample_regex: ".fastq"
tech: "10xv2"
strand: "--fr-stranded"
cell_annots: null
cell_annots_key: "cell_id"
exclude_unannotated_cells: False
target: "utrome_mm10_v2"
output_type:
  - "genes"
  - "txs"
output_format:
  - "sce"
tmp_dir: "/q/home/barbieri/athero_scRNA/tmp"
bx_whitelist: "/q/home/barbieri/scUTRquant/extdata/bxs/737K-august-2016.txt"
correct_bus: True
min_umis: 500
targets_config: "/q/home/barbieri/scUTRquant/extdata/targets/targets.yaml"
use_hdf5: False
include_reports: True

Many thanks in advance for any pointers :)

mfansler commented 3 weeks ago

Thanks for your interest and sorry you're having issues getting it going! I'm happy to help resolve the issue.

At first glance, here are some things that come to mind:

sample_regex is usually not needed; perhaps try removing that
would you mind also posting the contents of the sample_file so I can confirm that looks good
have you had any success running one of the examples? that can be a good way to verify everything else is working
the fact that only the genes output is an issue is unusual; perhaps try leaving that off for now; the txs output is usually the crucial one to get working; worst case, I can show you how to convert from tx- to gene-level output

I apologize in advance if I'm overlooking something. Currently traveling, so I may not be as responsive until Monday.

mfansler commented 3 weeks ago

Also, the message "Loaded 2 samples." does not match the description of having 2 conditions with 3 replicates. I would expect it should report 6 samples if the sample sheet were correct.

rbarbieri86 commented 3 weeks ago

First of all thanks for your reply, I will try to address your points.

-Ok I will remove the sample_regex -Please look below -Yes I have run both of the suggested examples with success -I can remove the genes part, I am mostly interested in the APA -You might be right, I have obtained the fastq file by using 10x bamtofastq and it does look like the 2 BAM files contained 3 samples each. I will try to restrucure the sample_file accordingly

sample_id,file_type,files A9,fastq,/q/home/barbieri/athero_scRNA/Fastq_files/A9/A9_MissingLibrary_1_CB54YACXX/bamtofastq_S1_L007_R1_001.fastq;/q/home/barbieri/athero_scRNA/Fastq_files/A9/A9_MissingLibrary_1_CB54YACXX/bamtofastq_S1_L007_R2_001.fastq;/q/home/barbieri/athero_scRNA/Fastq_files/A9/A9_MissingLibrary_1_CB54YACXX/bamtofastq_S1_L007_R1_002.fastq;/q/home/barbieri/athero_scRNA/Fastq_files/A9/A9_MissingLibrary_1_CB54YACXX/bamtofastq_S1_L007_R2_002.fastq;/q/home/barbieri/athero_scRNA/Fastq_files/A9/A9_MissingLibrary_1_CB54YACXX/bamtofastq_S1_L007_R1_003.fastq;/q/home/barbieri/athero_scRNA/Fastq_files/A9/A9_MissingLibrary_1_CB54YACXX/bamtofastq_S1_L007_R2_003.fastq A10,fastq,/q/home/barbieri/athero_scRNA/Fastq_files/A10/A10_MissingLibrary_1_CB54YACXX/bamtofastq_S1_L007_R1_001.fastq;/q/home/barbieri/athero_scRNA/Fastq_files/A10/A10_MissingLibrary_1_CB54YACXX/bamtofastq_S1_L007_R2_001.fastq;/q/home/barbieri/athero_scRNA/Fastq_files/A10/A10_MissingLibrary_1_CB54YACXX/bamtofastq_S1_L007_R1_002.fastq;/q/home/barbieri/athero_scRNA/Fastq_files/A10/A10_MissingLibrary_1_CB54YACXX/bamtofastq_S1_L007_R2_002.fastq;/q/home/barbieri/athero_scRNA/Fastq_files/A10/A10_MissingLibrary_1_CB54YACXX/bamtofastq_S1_L007_R1_003.fastq;/q/home/barbieri/athero_scRNA/Fastq_files/A10/A10_MissingLibrary_1_CB54YACXX/bamtofastq_S1_L007_R2_003.fastq

rbarbieri86 commented 3 weeks ago

Hello, just wanted to report that modifying the sample_file solved the issue. Mine looks like this now:

sample_id,file_type,files A9_1,fastq,/q/home/barbieri/athero_scRNA/Fastq_files/A9/A9_MissingLibrary_1_CB54YACXX/A9_1/A9_1_bamtofastq_S1_L006_R1_001.fastq.gz;/q/home/barbieri/athero_scRNA/Fastq_files/A9/A9_MissingLibrary_1_CB54YACXX/A9_1/A9_1_bamtofastq_S1_L006_R2_001.fastq.gz A9_2,fastq,/q/home/barbieri/athero_scRNA/Fastq_files/A9/A9_MissingLibrary_1_CB54YACXX/A9_2/A9_2_bamtofastq_S1_L006_R1_002.fastq.gz;/q/home/barbieri/athero_scRNA/Fastq_files/A9/A9_MissingLibrary_1_CB54YACXX/A9_2/A9_2_bamtofastq_S1_L006_R2_002.fastq.gz A9_3,fastq,/q/home/barbieri/athero_scRNA/Fastq_files/A9/A9_MissingLibrary_1_CB54YACXX/A9_3/A9_3_bamtofastq_S1_L006_R1_003.fastq.gz;/q/home/barbieri/athero_scRNA/Fastq_files/A9/A9_MissingLibrary_1_CB54YACXX/A9_3/A9_3_bamtofastq_S1_L006_R2_003.fastq.gz A10_1,fastq,/q/home/barbieri/athero_scRNA/Fastq_files/A10/A10_MissingLibrary_1_CB54YACXX/A10_1/A10_1_bamtofastq_S1_L007_R1_001.fastq.gz;/q/home/barbieri/athero_scRNA/Fastq_files/A10/A10_MissingLibrary_1_CB54YACXX/A10_1/A10_1_bamtofastq_S1_L007_R2_001.fastq.gz A10_2,fastq,/q/home/barbieri/athero_scRNA/Fastq_files/A10/A10_MissingLibrary_1_CB54YACXX/A10_2/A10_2_bamtofastq_S1_L007_R1_002.fastq.gz;/q/home/barbieri/athero_scRNA/Fastq_files/A10/A10_MissingLibrary_1_CB54YACXX/A10_2/A10_2_bamtofastq_S1_L007_R2_002.fastq.gz A10_3,fastq,/q/home/barbieri/athero_scRNA/Fastq_files/A10/A10_MissingLibrary_1_CB54YACXX/A10_3/A10_3_bamtofastq_S1_L007_R1_003.fastq.gz;/q/home/barbieri/athero_scRNA/Fastq_files/A10/A10_MissingLibrary_1_CB54YACXX/A10_3/A10_3_bamtofastq_S1_L007_R2_003.fastq.gz

A quick question: does the sample_id be part of the filenames or is that unnecessary?

Thanks again.

mfansler commented 3 weeks ago

Great to hear you have it working!

No, the value in sample_id and files column don't need to be coordinated. Consider the sample_id a good opportunity to rename to something more informative.

Mayrlab / scUTRquant

Kallisto Bus doesn't give output #92