dfguan / purge_dups

haplotypic duplication identification tool
MIT License
202 stars 19 forks source link

species identifier (spid) & platform (PLTFM) #23

Open sjfleck opened 4 years ago

sjfleck commented 4 years ago

Thank you for this program. I was told about it during PAG2020 and am finally trying it out now for my ONT/Illumina hybrid assembly (created using a minimap2/miniasm/racon/pilon assembly pipeline).

I have a quick question about the spid for step 3 of the usage. I don't see it described anywhere or in anyone's issues. Is there an official list of species identifiers somewhere or is this a personal code that we come up with ourselves? I'm guessing that in your example for step 3, "iHelSar1" is the species identifier. I'm not finding results for this online, so I'm guessing that it's made by the person running the program.

Additionally, I'm wondering if there is a list of platforms. At my university, I submit jobs to various cluster partitions using the Slurm Workload Manager. I'm a little confused by what I should input for the -p flag. Any help would be appreciated

Thanks again, Steve

sjfleck commented 4 years ago

I'm also noticing in the run_purge_dups.py file that this is the minimap2 line of code on line 124:

jcmd = "minimap2 {4} -x map-pb -t {0} {1} {2} >{3}".format(core_lim, ref, fl_strip, out_fn, idx_opt)

in minimap2's usage, it says that "map-pb" works for both PacBio and Oxford Nanopore read so I assume that's fine, but I was wondering if there were any changes that I would need to make because I don't have pb reads (or possibly that I can't use this tool at all). Thanks.

dfguan commented 4 years ago

Hi Steve,

Thanks for trying purge_dups. 

We do have our species identifier and use it for our input, but it does not have to be the same, you can create it by yourself. 

As for the platform, the input should be "slurm" for your platform. I only tested it once on the slurm platform, so I can not guarantee it would work for you. It would be easier if you put the commands in Pipeline Guide in the Readme file into a shell script and submit it to your platform. 

As for the mapping command, I have modified the script to support minimap and bwa options. If you can not use these tools, you could either modify the script or write your shell script with the Pipeline Guide. 

The modified scripts are attached. Please replace the corresponding files in purge_dups scripts directory.

Please let me know if you have any questions. 

Thanks,

Dengfeng.

scrip.zip

sjfleck commented 4 years ago

Dengfeng, thank you very much for all your help! this is very useful -Steve