chalkidiki commented 4 years ago

Dear Schulz Lab,

thank you for providing such an interesting tool.

No matter if I try to run the fusion detection or quantification tool with the downloaded references from ensembl, I get these error messages. This appears to me similar to one of the previously opened issues. I would be very happy to get any hint how to solve these issues. Below you find my commands and the error messages.

snakemake --cores 40 all -s Snakefile_fusion wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored MissingInputException in line 127 of ./Aeron/Snakefile_fusion: Missing input files for rule sam_to_bam: fusiontmp/reads_tofusions_P_HomouusapiensuuGRCh38uucdnauuall_ens92uuhg38.sam

snakemake --cores=10 MissingInputException in line 41 of ./Aeron/Snakefile: Missing input files for rule align: input/r

Thank you!

ddurai commented 4 years ago

Dear Marius, I apologize for the late response. I am not able to replicate the error on my side.

Can you please provide me with your config file.

A possible cause might the string you give in the input parameter of the config file. You just need to give the filename in the input parameter and not in the format "input/filename". For instance, if in the input folder, you have an inut file "reads.fq", then in the config file you just need to provide "reads.fq" and not "input/reads.fq"

Regards Dilip A Durai

chalkidiki commented 4 years ago

Dear Dilip, thank you so much for your response. You find my config file below. Maybe it would anyways be a good idea to download the newest version Aeron?

Best,

Marius

Should be in the input folder

format must be .vg

graph: ens92uuhg38.gfa

reference transcripts

format can be either fasta/fastq, gzipped or not

Should be in the input folder

transcripts: HomouusapiensuuGRCh38uucdnauuall.fa

sequenced reads

Should be in the input folder

format can be either fasta/fastq, gzipped or not

for more files, add them in new lines starting with "- "

NOTE: the file names without ending must be unique! You cannot have eg. reads.fq and reads.fa

reads: BePiuudirectuucDNA.fq

optional params below: default values will probably work

size of the seed hits. Fewer means more accurate but slower alignments.

seedsize: 17

max number of seeds. Fewer means faster but more inaccurate alignment

maxseeds: 20

alignment_selection: --greedy-length #Do not change alignment_E_cutoff: 1 fusion_max_error_rate: 100 fusion_min_score_difference: 10

bandwidth for the aligner. Higher means more accurate but slower alignment.

aligner_bandwidth: 35

gtffile: HomouusapiensuuGRCh38uu92.gtf

file paths

scripts: AeronScripts/

binaries: Binaries/

needed to convert mummer seeds to .gam seeds

vgpath: /NGS/vg/bin/vg

SchulzLab commented 4 years ago

Thanks Marius, I agree that it is a good idea to test the latest checkout of Aeron, and look over the improved Readme.

Hope this helps, Marcel

chalkidiki commented 4 years ago

Thank you! I did exactly that. So downloaded the files from github in a new repository.

It seems that I get the same error message for both commands. Could it be a parse error?

snakemake --cores 40 all -s Snakefile_fusion wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored MissingInputException in line 127 of ./Aeron/Snakefile_fusion: Missing input files for rule sam_to_bam: fusiontmp/reads_tofusions_onlyfusion_t_HomouusapiensuuGRCh38uucdnauuall_ens92uuhg38.sam

snakemake --cores=10 MissingInputException in line 41 of /media/data/mkle/Aeron/Snakefile: Missing input files for rule align: input/u

Here is my config.yaml:

input files at top: check them!

all input files must be in the folder ./input/

use the full file name, including file ending

input splice graph

Should be in the input folder

format must be .vg

graph: ens92uuhg38.gfa

reference transcripts

format can be either fasta/fastq, gzipped or not

Should be in the input folder

transcripts: HomouusapiensuuGRCh38uucdnauuall.fa

sequenced reads

Should be in the input folder

format can be either fasta/fastq, gzipped or not

for more files, add them in new lines starting with "- "

NOTE: the file names without ending must be unique! You cannot have eg. reads.fq and reads.fa

reads: BePiuudirectuucDNA.fq

Needed for expression quantificatino

Should be in the input folder

gtffile: HomouusapiensuuGRCh38uu92.gtf

needed to convert between alignment formats

https://github.com/vgteam/vg

vgpath: /NGS/vg/bin/vg

optional parameters below: default values will probably work

fusion_max_error_rate: 0.2 fusion_min_score_difference: 200

size of the seed hits. Fewer means more accurate but slower alignments.

seedsize: 17

max number of seeds. Fewer means faster but more inaccurate alignment

maxseeds: 20

No need to change these

aligner_bandwidth: 35 alignment_selection: --greedy-length alignment_E_cutoff: 1

scripts: AeronScripts binaries: Binaries

ddurai commented 4 years ago

Dear Marius, Thanks for providing us with the config file. I was able to replicate the error. The problem is the pipeline expects the input files as a list (since there may be multiple input files ). So, you can rerun the program by changing the input parameter of your config file by inserting a new line between "reads: " and the "input filename". For instance:

reads:
- BePiuudirectuucDNA.fq

Hope this helps. I apologize for the inconvenience

regards Dilip

maickrau commented 4 years ago

Hi Marius,

The newest version in commit 7587de6 now parses the read file when it is in the same line.

Br1anChou commented 4 years ago

Hi, I have a similar problem, when I run "snakemake --cores 10 all -s Snakefile_fusion", I got,

wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored Building DAG of jobs... MissingInputException in line 127 of /software/Aeron/Snakefile_fusion: Missing input files for rule sam_to_bam: fusiontmp/reads_tofusions_onlyfusion_D_ReferenceTranscriptFastaFile_HumanUpdated38V5.sam

It looks like some files are missing, but I don't know how to get them. I would appreciate it if you can provide some necessary test files and corresponding results.

chalkidiki commented 4 years ago

Hi Dilip and Maickrau,

I downloaded the newest version and also changed the config.yaml as you suggested

reads:

BePiuudirectuucDNA.fq

Now i get at least a different error message:

snakemake --cores 40 all -s Snakefile_fusion wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored MissingInputException in line 59 of ./Aeron/Snakefile_fusion: Missing input files for rule pair_assignments: output/aln_HomouusapiensuuGRCh38uucdnauuall_ens92uuhg38_full_length.gam

snakemake --cores=10 MissingInputException in line 152 of ./Aeron/Snakefile: Missing input files for rule output_alignment_statistics: tmp/aligner_stderr_HomouusapiensuuGRCh38uucdnauuall_ens92uuhg38.txt tmp/aligner_stdout_HomouusapiensuuGRCh38uucdnauuall_ens92uuhg38.txt

Thanks for your help! Would be awesome to get your tool working quickly!

regards,

Marius

maickrau commented 4 years ago

Hi,

The previous commit fixed this for quantification but missed it for the fusion pipeline, 0495121 fixes this for the fusion detection as well.

chalkidiki commented 4 years ago

Hi, unfortunately I always get this new error message I posted above, which is different from the one I initially got, even when using the updated version or using the config file like that:

reads:

BePiuudirectuucDNA.fq

I would be very happy if you would help me to solve this problem.

Best,

Marius

chalkidiki commented 4 years ago

Dear Schulz Lab,

I would be very happy if you could help me to get your tool finally running. I can also provide you with more information if you want to.

Best,

Marius

chalkidiki commented 4 years ago

Dear Schulz Lab,

I did a github pull again, wrote a new config.yaml according what you told me here again and I still get error messages. When I run the quantification tool I always get different ones

snakemake --cores=10 MissingInputException in line 123 of /media/data/mkle/Aeron/Snakefile: Missing input files for rule output_assignment_statistics: tmp/aligner_stdout_HomouusapiensuuGRCh38uucdnauuall_ens92uuhg38.txt tmp/aligner_stdout_BePiuudirectuucDNA_ens92uuhg38.txt

snakemake --cores=10 MissingInputException in line 152 of /media/data/mkle/Aeron/Snakefile: Missing input files for rule output_alignment_statistics: tmp/aligner_stderr_HomouusapiensuuGRCh38uucdnauuall_ens92uuhg38.txt tmp/aligner_stdout_HomouusapiensuuGRCh38uucdnauuall_ens92uuhg38.txt

When I run the fusion detection tool I always get this:

snakemake --cores 40 all -s Snakefile_fusion wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored wildcard constraints in inputs are ignored MissingInputException in line 61 of /media/data/mkle/Aeron/Snakefile_fusion: Missing input files for rule pair_assignments: output/aln_HomouusapiensuuGRCh38uucdnauuall_ens92uuhg38_full_length.gam

The config.yaml file is exactly the one I posted above, just:

reads:

BePiuudirectuucDNA.fq

changed as you suggested.

I woudl be very happy to get your tool finally working for us and I have no idea what could be the error from my side.

Best,

Marius

linagapa commented 4 years ago

Dear Schulz Lab, I've got the exact same error that chalkidiki showed:

MissingInputException in line 152 of /Software/Aeron/Snakefile: Missing input files for rule output_alignment_statistics: tmp/aligner_stdout_BC09_HumanUpdated38V5.txt tmp/aligner_stderr_BC09_HumanUpdated38V5.txt

I would deeply appreciate any help on this issue since it is unclear to me what would be the source of the problem.

Best regards, Lina.

chalkidiki commented 4 years ago

Dear Schulz Lab,

I made a new github pull, downloaded the reference genome, transcriptome and gtf file from Encode and uploaded it newly. Also I used another Nanopore direct cDNA as input than before. Now I get however a different error message:

MissingInputException in line 123 of /media/data/mkle/Aeron/Snakefile: Missing input files for rule output_assignment_statistics: tmp/aligner_stdout_814cDNA_hg38.txt tmp/aligner_stdout_Homosapiens_hg38.txt

Maybe there is a way to deal with this error? Or do you think its somwhow the same one as before? It would be great to be able to run your tool. Could you may also send me the details how you setup your system while you run Aeron? Maybe we could make us of this to get Aeron finally running.

Best,

Marius

noncodo commented 4 years ago

I'm getting the same error as @chalkidiki Building DAG of jobs... MissingInputException in line 123 of /home/apps/Aeron/Snakefile: Missing input files for rule output_assignment_statistics: tmp/aligner_stdout_280filt_hg38controlgencodeV35.txt tmp/aligner_stdout_gencodev35controlannotation_hg38controlgencodeV35.txt

As a side note, the readme should also mention that no periods should be present in the filename (except for the file extension) in addition to underscores

SchulzLab / Aeron

MissingInputException in line 127 of ./Aeron/Snakefile_fusion: Missing input files for rule sam_to_bam: #7

Should be in the input folder

format must be .vg

reference transcripts

format can be either fasta/fastq, gzipped or not

Should be in the input folder

sequenced reads

Should be in the input folder

format can be either fasta/fastq, gzipped or not

for more files, add them in new lines starting with "- "

NOTE: the file names without ending must be unique! You cannot have eg. reads.fq and reads.fa

optional params below: default values will probably work

size of the seed hits. Fewer means more accurate but slower alignments.

max number of seeds. Fewer means faster but more inaccurate alignment

bandwidth for the aligner. Higher means more accurate but slower alignment.

file paths

needed to convert mummer seeds to .gam seeds

input files at top: check them!

all input files must be in the folder ./input/

use the full file name, including file ending

input splice graph

Should be in the input folder

format must be .vg

reference transcripts

format can be either fasta/fastq, gzipped or not

Should be in the input folder

sequenced reads

Should be in the input folder

format can be either fasta/fastq, gzipped or not

for more files, add them in new lines starting with "- "

NOTE: the file names without ending must be unique! You cannot have eg. reads.fq and reads.fa

Needed for expression quantificatino

Should be in the input folder

needed to convert between alignment formats

https://github.com/vgteam/vg

optional parameters below: default values will probably work

size of the seed hits. Fewer means more accurate but slower alignments.

max number of seeds. Fewer means faster but more inaccurate alignment

No need to change these