bcgsc / goldrush

Linear-time de novo Long Read Assembler
GNU General Public License v3.0
35 stars 2 forks source link

Using existing Draft Assembly in place of Gold Path Input #113

Closed Sgakeson closed 1 year ago

Sgakeson commented 1 year ago

Hello, I'm wondering if its possible to leverage the editing and polishing steps of goldrush on an existing draft assembly without running the goldpath portion? Thank you, Stuart

jwcodee commented 1 year ago

Hello Stuart,

Thank you for interest in using GoldRush.

It is possible. I do not have access to a computer at the moment so I cannot give you very specific instructions. What you can do is to run 'goldrush' with the 'dry-run' option. You can then softlink your existing genome assembly to the input of goldpolish (goldrush-edit).

lcoombe commented 1 year ago

Hi @Sgakeson,

Just adding to what @jwcodee mentioned, if you only want to run the polishing "GoldPolish" step of GoldRush (aka GoldRush-Edit) you can follow the instructions in this repo, which is a submodule of GoldRush: https://github.com/bcgsc/goldrush-edit

If you were hoping to run additional steps (ex. GoldPath + Tigmint-long + GoldChain), then yes, the easiest thing to do would be to do the dry-run option like @jwcodee mentioned. For example, if I run the command from the provided tests using a dry-run, I get this output:

(goldrush) lcoombe03:tests lcoombe$ goldrush run reads=test_reads G=1e6 t=4 p=goldrush_test -n
goldrush-path  -k 22 -w 16 -t 1000 -u 5 -a 1 -o 0.1 -p goldrush_test_silver_path -i test_reads.fq  -h 3 -j 4 -x10 -P 15 -d 5 -s 1011011110110111101101 -g 1e6 -b 10 -r 0.9 --silver_path -M 5 -m 20000
cat goldrush_test_silver_path_*.fq > goldrush_test_silver_path_all.fq
goldrush-path  -k 22 -w 16 -t 1000 -u 5 -a 1 -o 0.1 -p goldrush_test_golden_path -i goldrush_test_silver_path_all.fq -h 3 -j 4 -P 15 -d 5 -x10 -s 1011011110110111101101 -g 1e6 -b 10  -m 0
echo "Done GoldRush-Path! Golden path can be found in: goldrush_test_golden_path.fa"
rm -f goldrush_test_silver_path_*.fq
echo "Clean Done"
goldrush-edit --minimap2 -t4 goldrush_test_golden_path.fa test_reads.fq goldrush_test_golden_path.goldrush-edit-polished.fa
echo "Done GoldRush-Path + GoldRush-Edit! GoldRush-Edit polished golden path can be found in: goldrush_test_golden_path.goldrush-edit-polished.fa"
tigmint-make tigmint-long draft=goldrush_test_golden_path.goldrush-edit-polished reads=test_reads cut=250 t=4 G=1e6 span=2 dist=500
ln -sf goldrush_test_golden_path.goldrush-edit-polished.cut250.tigmint.fa goldrush_test_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa
echo "Done GoldRush-Path + GoldRush-Edit + Tigmint-long! Post-Tigmint-long golden path can be found in: goldrush_test_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa"
ntLink_rounds run_rounds_gaps target=goldrush_test_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa t=4 k=40 w=250 z=1000 rounds=5 reads=test_reads.fq
ntLink_rounds clean target=goldrush_test_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa t=4 k=40 w=250 z=1000 rounds=5 reads=test_reads.fq
ln -sf goldrush_test_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k40.w250.z1000.ntLink.gap_fill.5rounds.fa goldrush_test_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k40.w250.ntLink-5rounds.fa
echo "Done GoldRush-Path + GoldRush-Edit + Tigmint-long + 5 ntLink rounds! Your final assembly can be found in: goldrush_test_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k40.w250.ntLink-5rounds.fa"

In order to start at the GoldPolish/GoldRush-Edit stage, you could soft-link your existing assembly to the name goldrush_test_golden_path.fa in your current working directory. Then, if you launch the same command again, then the pipeline should start from after the golden path stage.

Let us know if you have any further questions - thank you for your interest in GoldRush! Lauren

Sgakeson commented 1 year ago

Thank you both very much! I'll start testing these functionalities out.