WangJiuming / YOLOOP

MIT License
1 stars 0 forks source link

Can YOLOOP be used to predict loops in low-coverage data? #3

Closed YusenHou519 closed 2 months ago

YusenHou519 commented 2 months ago

Hi,

Thank you for your excellent work.

I am wondering can YOLOOP be used to predict loops in low-coverage data? When I analyzed on the GM12878 data with 5kb resolution and 500M coverage using your software, I noticed that the number of detected loops was relatively low. Could you please advise on how to adjust the software or parameters to improve loop detection in such cases?

The script that I used:

#!/bin/bash
YOLOOPDIR="./YOLOOP"
cd $YOLOOPDIR

##---------------------------
## Parameters
##---------------------------

RESOL=5000
cover="frac0.125"
THRESHOLD=0.0
BALANCE="false"
OUTPATH="./yoloop_GM12878_500M"

##---------------------------
## Fixed Parameters
##---------------------------

PYTHONFILE="${YOLOOPDIR}/detect.py"
MODELPATH="${YOLOOPDIR}/models/gm12878_hic_10kb.pt"
COOLPATH="./4DNFIXP4QG5B_Rao2014_GM12878_${cover}.hg38.mapq_10.1000.mcool"

##--------------------------------------------------
## Create the OUTPATH directory if it doesn't exist
##--------------------------------------------------

if [ ! -d "$OUTPATH" ]; then
    mkdir -p "$OUTPATH"
fi

##--------------------------------------------------
## Run
##--------------------------------------------------

echo "Starting chromosome processing..." 

if [ "$BALANCE" == "false" ]; then

    python $PYTHONFILE --cm $COOLPATH -r $RESOL -m $MODELPATH --out $OUTPATH -t $THRESHOLD

elif [ "$BALANCE" == "true" ]; then

    python $PYTHONFILE --cm $COOLPATH -r $RESOL -m $MODELPATH --out $OUTPATH -t $THRESHOLD -b

fi

echo "Done."

cd -

The result: yoloop_pred_4DNFIXP4QG5B_Rao2014_GM12878_frac0.125.hg38.mapq_10.1000.zip

Thank you for your time and guidance.

Best regards, Yusen

frankchen121212 commented 2 months ago

Hi Yusen,

It seems like you are applying YOLOOP on the low-coverage data. I can see that the input file is the GM12878 cell line in 5kb resolution but with 500M coverage.

When applying the model with bulk-trained data, it might not perform well. I've uploaded a weight file trained on scHi-C data, which is trained in an extremely low-coverage situation. It might be able to help in your situation.

In the meantime, would you point out the source link of your 4DNFIXP4QG5B_Rao2014_GM12878 with 500MB coverage? We might look into that.

Thank you.

YusenHou519 commented 2 months ago

Hi Frank,

Thank you for your suggestion.

The weight file you provided works on low coverage Hi-C data. I'm currently benchmarking some of the top methods for loop detection. The data I'm using is directly downsampled from the original GM12878 dataset. You can access the original data here: 4DNFIXP4QG5B.

Thanks again!