Closed FredoJones closed 2 years ago
It looks like the error occurs during the deep learning step. Do you mind share your input with me by email?
The reference genomes are: -hg19.fasta -gtf file from https://www.gencodegenes.org/human/release_19.html
the output folder for the scripts contains the following repos: ChiDist ChimericOut Expr fastq scFusionIndex scripts sniffer STARIndex STARMapping utils
the script I launch is:
`module load R/3.6.0 module load genetics/broadinstitute source ~/.bashrc conda activate /home/users/alfredo.marchetti.stud/analisi_vdj/alfredo.marchetti/sc_SC001/utils/fusionenv
python /home/users/alfredo.marchetti.stud/analisi_vdj/alfredo.marchetti/sc_SC001/utils/scFusion-2.0.2/scFusion.py FusionCandidate \ -d /home/users/alfredo.marchetti.stud/analisi_vdj/alfredo.marchetti/sc_SC001/scFusionIndex \ -b 1 \ -e 5 \ -o /home/users/alfredo.marchetti.stud/analisi_vdj/alfredo.marchetti/sc_SC001 ` I will share with you the fastqs as soon as I get clearance. Would you need anything else?
Looks fine. And also check all intermediate files are not empty
It seems that all the folders that are generated are not empty. The fastq files look like this:
@A00721:422:HHH2WDSX3:3:1101:2826:1000 1:N:0:CCAAGATG NTCGTAACATTCTCATACTTCTTCAG + #FFFFFFFFFFFFFFFFFFFF:FFFF @A00721:422:HHH2WDSX3:3:1101:7148:1000 1:N:0:CCAAGATG NTGGCAATCTGTGCAAACCTGGGGAA + #FFFFFFFFFFFFFFFFFFFFFFFFF @A00721:422:HHH2WDSX3:3:1101:7744:1000 1:N:0:CCAAGATG NCACGGATCATCTGCCAATATGTCCT + #FFFFFFFFFFFFFFFFFFF:FFFFF @A00721:422:HHH2WDSX3:3:1101:8106:1000 1:N:0:CCAAGATG NTGCTTCGTCTAGCGCGGCAGGTGTA + #FFFFFFFFFFFFFFFFFFF:F:F,F @A00721:422:HHH2WDSX3:3:1101:9607:1000 1:N:0:CCAAGATG NACTTGTCACGAAACGACCATAAATC `
The output of the ChiDist folder looks different from previous attempts:
21M May 26 23:42 ChiDist_middle.txt
2.3M May 26 23:42 FusionRead.txt
20M May 26 23:36 Homo.txt
128 May 26 19:36 Reads.npy
128 May 26 19:36 Reads_rev.npy
Y. It seems that the deep learning data was not expectedly generated. Or could you send me the files in CHiDist folder?
I sent it via email at jinzijie@pku.edu.cn
My FASTQ are 10x while this tool seem to work only with smartseq data. Is there any chance this could work on my 10x?
Our tool was optimized for Smart-Seq data rather than 10X. While you can run scFusion in a 10X dataset, the performance may be poor.
Hi, great tool! I am trying to set it up on a single cell RNA experiment from one sample. I have 5 fastq renamed according to your nomenclature. 1_1.fastq 1_2.fastq 2_1.fastq 2_2.fastq 3_1.fastq 3_2.fastq 4_1.fastq 4_2.fastq 5_1.fastq 5_2.fastq
The steps up to ReadProcessing work without errors. I am loading the tools partially through conda and from modules in my server. In particular I load samtools from module as there are issues installing it through conda. In the conda environment I keep:
-tensorflow 2.8.0 cpu_py39h4655687_0 conda-forge -scipy 1.8.1 py39he49c0e8_0 conda-forge -numpy 1.22.3 py39hc58783e_2 conda-forge -star 2.7.10a h9ee0642_0 bioconda -pysam 0.19.0 py39h5030a8b_0 bioconda -pyensembl 2.0.0 pyh5e36f6f_0 bioconda -keras 2.8.0 pyhd8ed1ab_0 conda-forge -bedtools 2.30.0 h468198e_3 bioconda I know this does not match exactly your package description in the manual but certain versions of some packages cannot be installed without updating others. Would these slight variations responsible for the errors below?
During the genome indexing step i get this error while the command still completes the task:
`/gpfs/home/projects/analisi_vdj/alfredo.marchetti/sc_SC001/utils/fusionenv/lib/python3.9/site-packages/gtfparse/read_gtf.py:82: FutureWarning: The error_bad_lines argument has been deprecated and will be removed in a future version. Use on_bad_lines in the future.
chunk_iterator = pd.read_csv( /gpfs/home/projects/analisi_vdj/alfredo.marchetti/sc_SC001/utils/fusionenv/lib/python3.9/site-packages/gtfparse/read_gtf.py:82: FutureWarning: The warn_bad_lines argument has been deprecated and will be removed in a future version. Use on_bad_lines in the future.
chunk_iterator = pd.read_csv(
Data1[index,:,0] = np.array([int(c) for c in ChimericRead[index].upper().replace('A','0').replace('T','1').replace('C','2').replace('G','3').replace('H','4')])
ValueError: could not broadcast input array from shape (27,) into shape (61,)
`
Could you provide some guidance?
I apologize if the report is not complete, please let me know if you need additional info.
Greetings
But the main issue is when running FusionCandidate:
Starting: 1 Candidate Size: 0 Found Size: 0 Starting: 2 Candidate Size: 16246 Found Size: 16246 Starting: 3 Candidate Size: 22217 Found Size: 22203 Starting: 4 Candidate Size: 25999 Found Size: 25969 Starting: 5 Candidate Size: 28898 Found Size: 28856 Traceback (most recent call last): File "/home/users/alfredo.marchetti.stud/analisi_vdj/alfredo.marchetti/sc_SC001/utils/scFusion-2.0.2//bin//PreProcessing_SingleFile.py", line 50, in