dpuiu / MitoHPC

MIT License
10 stars 12 forks source link

Issue reproducing results with example datasets ? #16

Open PFRoux opened 6 months ago

PFRoux commented 6 months ago

Hi !

Thanks a lot for this really promising tools.

On my side, everything seems to run smoothly when using the data from "example1". Nevertheless, when looking at the mutect2 summary files, everything is set at 0.

cat /usr/local/bioinfo/src/MitoHPC/example_on_cluster/out/*summary

id  count   nonZero min max median  mean    sum
H   30  0   0   0   0   0   0
h   30  0   0   0   0   0   0
S   30  0   0   0   0   0   0
s   30  0   0   0   0   0   0
I   30  0   0   0   0   0   0
i   30  0   0   0   0   0   0
Hp  30  0   0   0   0   0   0
hp  30  0   0   0   0   0   0
Sp  30  0   0   0   0   0   0
sp  30  0   0   0   0   0   0
Ip  30  0   0   0   0   0   0
ip  30  0   0   0   0   0   0
A   30  0   0   0   0   0   0
id  count   nonZero min max median  mean    sum
H   30  0   0   0   0   0   0
h   30  0   0   0   0   0   0
S   30  0   0   0   0   0   0
s   30  0   0   0   0   0   0
I   30  0   0   0   0   0   0
i   30  0   0   0   0   0   0
Hp  30  0   0   0   0   0   0
hp  30  0   0   0   0   0   0
Sp  30  0   0   0   0   0   0
sp  30  0   0   0   0   0   0
Ip  30  0   0   0   0   0   0
ip  30  0   0   0   0   0   0
A   30  0   0   0   0   0   0
id  count   nonZero min max median  mean    sum
H   30  0   0   0   0   0   0
h   30  0   0   0   0   0   0
S   30  0   0   0   0   0   0
s   30  0   0   0   0   0   0
I   30  0   0   0   0   0   0
i   30  0   0   0   0   0   0
Hp  30  0   0   0   0   0   0
hp  30  0   0   0   0   0   0
Sp  30  0   0   0   0   0   0
sp  30  0   0   0   0   0   0
Ip  30  0   0   0   0   0   0
ip  30  0   0   0   0   0   0
A   30  0   0   0   0   0   0
id  count   nonZero min max median  mean    sum
H   30  0   0   0   0   0   0
h   30  0   0   0   0   0   0
S   30  0   0   0   0   0   0
s   30  0   0   0   0   0   0
I   30  0   0   0   0   0   0
i   30  0   0   0   0   0   0
Hp  30  0   0   0   0   0   0
hp  30  0   0   0   0   0   0
Sp  30  0   0   0   0   0   0
sp  30  0   0   0   0   0   0
Ip  30  0   0   0   0   0   0
ip  30  0   0   0   0   0   0
A   30  0   0   0   0   0   0
id  count   nonZero min max median  mean    sum
H   30  0   0   0   0   0   0
h   30  0   0   0   0   0   0
S   30  0   0   0   0   0   0
s   30  0   0   0   0   0   0
I   30  0   0   0   0   0   0
i   30  0   0   0   0   0   0
Hp  30  0   0   0   0   0   0
hp  30  0   0   0   0   0   0
Sp  30  0   0   0   0   0   0
sp  30  0   0   0   0   0   0
Ip  30  0   0   0   0   0   0
ip  30  0   0   0   0   0   0
A   30  0   0   0   0   0   0
id  count   nonZero min max median  mean    sum
H   30  0   0   0   0   0   0
h   30  0   0   0   0   0   0
S   30  0   0   0   0   0   0
s   30  0   0   0   0   0   0
I   30  0   0   0   0   0   0
i   30  0   0   0   0   0   0
Hp  30  0   0   0   0   0   0
hp  30  0   0   0   0   0   0
Sp  30  0   0   0   0   0   0
sp  30  0   0   0   0   0   0
Ip  30  0   0   0   0   0   0
ip  30  0   0   0   0   0   0
A   30  0   0   0   0   0   0

Is this expected ?

Because when running the workflow on my own data, I got the same result as well, which is suspicious.

Am I missing something ?

Thanks a lot and have a great day.

++

Pef

dpuiu commented 6 months ago

Hi Pef, sorry about the late reply and than you for letting me know about this issue/

The problem with example1 samples is that the reads were simulated at low coverage (~100x), which is about the same as the minimum coverage thold (init.sh: HP_DP=100). I have updated the code and set HP_DP=50 which should be "high enough" .
To rerun example1, just follow the steps described in README "RE-RUN PIPELINE (optional)" section.

PFRoux commented 6 months ago

Hi !

Thanks a lot for your feedback.

Unfortunately - even lowering this parameter as low as 20 is still producing the same output, with only zeros.

The issue is that I got the same result when running MitoHPC on my samples, and I don't know if this is related to my sample, or to MitoHPC failing to properly run. Indeed, the STDERR and STDOUT are quite "complicated" and I am struggling to understand if there is a problem or not. I really need to validate the workflow on a clean example data set to make sure there is nothing to look at in my own data (which I really doubt).

Could you help me please ?

Thanks a lot.

Pef

dpuiu commented 6 months ago

Have you tried running MitoHPC on the examples2/ samples?

If you look at each sample's vcf files, you should see a few dozen SNVs in each: $ grep -c "^c" examples/out/sample_/sample_.mutect2.vcf