Closed MasayukiNagai closed 3 years ago
Got your problem. I will check it in three days.
On Sep 18, 2021, at 00:34, Moon @.***> wrote:
Hi,
When I ran the program using Testdata, it caused an error when executing ./bin/CombinePipeline_Retrain.sh as follows.
$ python scFusion.py \
-f Testdata/Testdata/ \
-o TestOut/ \
-b 1 -e 10 -t 20 \
-s
Traceback (most recent call last):
File "scFusion.py", line 273, in
Then, I ran the shell script alone to see what the problem was.
$ sh ./bin/CombinePipeline_Retrain.sh TestOut/ . ./bin/../data/weight-V9-2.hdf5 10 ./bin
Traceback (most recent call last):
File "./bin/Data_preprocess_MyRetrain.py", line 29, in
So, I checked the ./bin/Data_preprocess_MyRetrain.py and its input files. It seems that it gives an error because my TestOut/Retrain/ChimericRead.txt is empty and thus MergePoint = int(readinfo_split[1]) fails.
It seems that even before the python script is executed, some of the files generated have no data in them. (The first empty file generated should be {outdir}/ChiDist/Homo.txt if I understand the flow correctly) The following shows the sizes of files in the ChimericOut, ChiDist, and Retrain folders.
bit 2026 Sep 17 22:18 1_FusionSupport.txt bit 182158 Sep 17 22:18 1_geneanno.sam bit 177442 Sep 17 22:18 1.sam ...
bit 0 Sep 17 22:18 ChiDist_middle.txt bit 0 Sep 17 22:18 FusionRead.txt bit 0 Sep 17 22:18 Homo.txt bit 128 Sep 17 22:19 Reads.npy bit 128 Sep 17 22:19 Reads_rev.npy
bit 0 Sep 17 22:19 ChimericRead.txt bit 0 Sep 17 22:19 SimuRead.txt
What would be the output you expect to get when running the program with Testdata? I would really appreciate it if you could peek into the problem! Also, if you could upload the results you get when you run the program with Testdata, it would be really helpful to compare with what I get. Thank you!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FZijieJin%2FscFusion%2Fissues%2F5&data=04%7C01%7C%7C78a0439a7d8648fd05a008d979f8f443%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637674932428760809%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=lvdRKdJrov36UhsI%2FtXwihJDV%2FqckS11oF8g0MIvcn8%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFNH4NFPH4SK76HQ7YGKMJLUCNUXPANCNFSM5EH5PWAA&data=04%7C01%7C%7C78a0439a7d8648fd05a008d979f8f443%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637674932428770760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zEBecb0JR8sZOODctlpBRGRpQbwhoTBic5Gd1EdfqR4%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7C%7C78a0439a7d8648fd05a008d979f8f443%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637674932428770760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=cys6t1Qs1%2BQW%2FAK4%2BKB%2FLg52kaTpYzHHrniGb0z51XQ%3D&reserved=0 or Androidhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7C%7C78a0439a7d8648fd05a008d979f8f443%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637674932428770760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0zi1G7vd%2B17QtEFIko1DAZDTZtLtiqZ%2F1UKFLPlk31A%3D&reserved=0.
Hi, MasayukiNagai. I found it a little strange here. I could run the script with the expected result. Could you run the third command in the CombinePipeline_startwith_FS.sh file?
python ${codedir}/FindHomoPattern_RAM.py ${FilePath}/ChimericOut/${prefix}FusionScore.txt ${hg19file} ${gtf} > ${FilePath}/ChiDist/${prefix}Homo.txt
Let's see what will happen. If it prints 'Bad Line', please show me the gtf file you use. I guess the script could not understand the gtf file so it printed nothing.
As you suspected, running the command printed several 'Bad Line's.
I attached the gtf file below that you can also get from this page on the UCSC website.
I also tried with hg19.refGene.gtf.gz, only to encounter the same error.
Would you mind telling me where you got your gtf file? (If from Ensemble, which release?)
I have updated the FindHomoPattern_RAM.py file. Please download it and replace the old file. scFusion can run properly with the gtf file you gave me and the new FindHomoPattern_RAM.py file.
Thanks to the change you made, the _FindHomoPatternRAM.py file seems to generate the Homo.txt file without any problems.
However, the next step caused the following error and only generated empty _ChiDistmiddle.txt
$ python ./bin/FindChiDist.py TestOut/ChimericOut/ 1 10 TestOut/Expr/ TestOut/ChiDist/Homo.txt . > TestOut/ChiDist/ChiDist_middle.txt
Traceback (most recent call last):
File "./bin//FindChiDist.py", line 419, in <module>
thischr2 = chr2num(CandidateList[l][3])
File "./bin//FindChiDist.py", line 24, in chr2num
return int(str)
ValueError: invalid literal for int() with base 10: 'Un_gl000220'
It seems that chr2num gives an error because of chrUn_gl000220
in {i}_FusionSupport.txt file.
Here is the 1_FusionSupport.txt that has chrUn_gl000220
just in case.
Yes, this is a known issue. I will fix it in the next version (v1.4). To temporarily avoid this bug, please delete all the lines in gtf file where the chromosome is not chr1-chr22 , chrX and chrY.
It seems to work now. Just to be sure, I've got empty {i}.rpkm.txt files in the Expr folder. Is that okay?
Also, would it be possible for you to upload the expected results, if not all, that you get by running the Testdata? I personally feel it a little hard to identify an issue when there is one because processes are executed via subprocess and even when a subprocess gives an error, the main process keeps running.
The empty expression files are not expected, and could you please delete the bed file and GenePos.txt file in the data/ folder and rerun scFusion? The example command is below:
python software/scFusion.py -f testdata/ -o testout/ -b 1 -e 10 -s hg19STARIndex/ -t 8 -n 0.9 -g hg19.fa -a ref_annot.gtf
I guess it will work with the proper gtf file.
scFusion is expected to report IGHJ5-IGHA1 fusion in the testout/FinalResult/FinalOutput.abridged.txt file, as I mentioned in README.
As you said, It is really hard to identify the issue when running scFusion using subprocesses. A quick way to check whether it runs properly is to check all the intermediate files and make sure they are not empty.
I see your point about the verification on Testdata.
I ran the command again after deleting the two files but I still get empty rpkm files. Would you mind attaching the gtf file you are using or sending me the link of where to get it (I looked up STAR-Fusion repo but could not find the gtf file you mentioned)?
Also, the bed file and GenePos.txt in data folder are empty after the execution
I am really sorry for the issue. The attached files are the splited gtf I used (Too big to upload) (First unzip them and then concatenate them). The gtf file I use can also be found here. Download the zip file(~30G)
Could you open the CombinePipeline_before_FS.sh in the bin/ folder and run the last command? Let's see what will happen.
Great! I can now get non-empty rpkm files and so on! (I used _GRCh37_gencode_v19_CTAT_libMar012021.source/gencode.v19.annotation.gtf just for record)
However, "FinalOutput.abridged.txt" only includes its header, which means that the file is basically empty. I don't see any empty files in any folder but FinalResult right now. I'll look at the code again but do you have any idea what causes this?
Great! The test data here is to help you check whether you can run scFusion properly, so we don't need to be aware of the biological meaning of reported fusions.
Did you add the parameter "-n 0.9" when running scFusion? Using default parameter, scFusion will report no fusion genes in this dataset. If no fusion genes are reported after specifying the -n parameter, please check the Allresult.txt file in FInalResult/temp/ and see whether IGHJ5-IGHA1 fusion is included in this file.
After adding "-n 0.9" parameter, I got the expected result and everything looks good!!
Thank you so much for your help!
Amazing! Now, you can run scFusion with your own dataset to detect gene fusion!
And I will fix the bugs mentioned above in the next version, stay tuned!
The good news is that everything is working normally without any errors.The bad news is that I didn't get any positive results. No matter in directory [/scFusion-1.4/Testdata/Testdata/FinalResult/FinalOutput.abridged.txt] or directory [FinalResult/temp/Allresult_filtered.txt], although relevant files are generated, there is no information. P.S.I did add parameters -n 0.9. Maybe somthing was wrong?Because when I am trouble shooting,I did change the way some packages were imported.
Although everything seems OK, the error message is not output to the screen(BUT in [scfusion/scFusion-1.4/Testdata/Testdata/log.txt],). As a result, I didn't realize that the program didn't do "predicting" "Step using Neural Network!" "Start Statistical Model" And so on.All in all, There is a problem with the import of Python modules. Now everything is ok,Thanks a lot!
Welcome. I will upgrade the user experience in the later version.
I've successfully run the program over the test data with scFusion v2.0.1, but it would be nice if you could specify -n 0.9
and the expected output in the manual.pdf as you did on the README.md before because just running the commands in the manual generates an output with no fusion in it.
This is not directly relevant to this issue, but after running FusionReport command, I got
Final Results are in ${outdir}/FinalResult/FinalOutput.abridged.txt
, which is on scFusion.py:254. However, I could not find the file probably because the file is copied to Result.abridged.txt
and the folder is renamed to Resulttemp
right after that. Thus, it would be great if you could adjust the print statement on scFusion.py:254. Thank you!
I've successfully run the program over the test data with scFusion v2.0.1, but it would be nice if you could specify
-n 0.9
and the expected output in the manual.pdf as you did on the README.md before because just running the commands in the manual generates an output with no fusion in it.
Good suggestions! Please see the lastest version!
Hi,
When I ran the program using Testdata, it caused an error when executing
./bin/CombinePipeline_Retrain.sh
as follows.Then, I ran the shell script alone to see what the problem was.
So, I checked the
./bin/Data_preprocess_MyRetrain.py
and its input files. It seems that it gives an error because myTestOut/Retrain/ChimericRead.txt
is empty and thusMergePoint = int(readinfo_split[1])
fails.It seems that even before the python script is executed, some of the files generated have no data in them. (The first empty file generated should be
{outdir}/ChiDist/Homo.txt
if I understand the flow correctly) The following shows the sizes of files in the ChimericOut, ChiDist, and Retrain folders.What would be the output you expect to get when running the program with Testdata? I would really appreciate it if you could peek into the problem! Also, if you could upload the results you get when you run the program with Testdata, it would be really helpful to compare with what I get. Thank you!