XiaoTaoWang / EagleC

A deep-learning framework for predicting a full range of structural variations from bulk and single-cell contact maps
Other
52 stars 8 forks source link

"TypeError: sequence item 18: expected str instance, NoneType found", continue with merge-redundant-SVs? #32

Closed tolender closed 1 year ago

tolender commented 1 year ago

Hi,

I ran into this error while running EagleC with "--output-format NeoLoopFinder", this error does not occur with "--output-format full" It seems to be a similar issue to issue #15 , but at a different step during predictSV. Input .mcool is CNV balanced by NeoLoopFinder, and this happens for both "hg38" and "other" (mm10)

Input code:

predictSV --hic-5k Sample.mcool::resolutions/5000 \
--hic-10k Sample.mcool::resolutions/10000 \
--hic-50k Sample.mcool::resolutions/50000 \
-O Sample_EagleC_predictSV \
-g hg38 \
--balance-type CNV \
--output-format NeoLoopFinder \
--prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999

After the breakpoints are found for each resolution, I get the _SVs.5K.txt, _SVs.10K.txt, and *_SVs.50K.txt files, but predictSV then throws a traceback error when trying to merge 10kb and 5kb.

2023-09-06 02:53:57.050786: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): 
INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype double and shape [2,21,21,1]
         [[{{node Placeholder/_0}}]]
1/1 [==============================] - 0s 5ms/step
root                      INFO    @ 09/06/23 02:53:58: Locate 10kb SV coordinates on the 5kb matrix ...
Traceback (most recent call last):
  File "/home/user/software/anaconda3/envs/eaglec/bin/predictSV", line 176, in <module>
    run()
  File "/home/user/software/anaconda3/envs/eaglec/bin/predictSV", line 130, in run
    subprocess.check_call(' '.join(command), shell=True)
TypeError: sequence item 18: expected str instance, NoneType found

Do I have to re-run predictSV with the cached files to get the combined list, or can I get the equivalent of finishing predictSV properly by using merge-redundant-SV with the 3 resolution files?

Thanks!

XiaoTaoWang commented 1 year ago

Hi, I have no idea about how this error only occurred with "--output-format NeoLoopFinder". However, if you have already obtained the predictions with "--output-format full", then you can do the format transformation using the command below:

$ merge-redundant-SVs --full-sv-files test.full.txt --output-format NeoLoopFinder -O test.NeoLoopFinder.txt
tolender commented 1 year ago

I went through the predictSV code and ran the steps between lines 125 to 158 explicitly with the cache folders and worked as intended. Seems the line that extract_cache_folder() was looking for in the log files were missing so it couldn't find the cache folders. Running EagleC locally and in parrallel in a two tmux sessions might have caused log updating issues. (just a guess)