clemgoub / dnaPipeTE

dnaPipeTE (for de-novo assembly & annotation Pipeline for Transposable Elements), is a pipeline designed to find, annotate and quantify Transposable Elements in small samples of NGS datasets. It is very useful to quantify the proportion of TEs in newly sequenced genomes since it does not require genome assembly and works on small datasets (< 1X).
48 stars 11 forks source link

Issues installing on a cluster/HPC #35

Closed tomh1lll closed 1 year ago

tomh1lll commented 4 years ago

Hi, I use dnaPipeTE frequently and it's great. I would like to get it working on my universities cluster but require some changes which I cannot work out. Specifically, my cluster has repeatmasker and trinity already installed in a module loading system, so I cannot install them again, but can use the preinstalled copies. How do I change dnaPipeTE to use Trinity and RepeatMasker that are already in the environment/path (as in I can use Trinity by just typing Trinity ) ?

Any help would be great.

Thanks

clemgoub commented 4 years ago

Hi Tom,

Thanks a lot for your message. I think we can tweak the code so it runs on your cluster. I need to look in the code again because there is a parameters file that override the paths when dnaPipeTE start. Can you send me the list of programs that you need to load as well as the command you use on your cluster to load them?

Thanks,

Clément

PS: a docker version of dnaPipeTE is in the works, I hope before the end of the year. This would make dnaPipeTE portable to virtually any machine.

tomh1lll commented 4 years ago

Hi Clement

Thanks for the speedy reply.

for my cluster I have to load preinstalled packages of interest using a command such as:

module load Trinity

Which I can do before I run dnaPipeTE, so that doesn't need to be part of the program.

Then I can use Trinity as if it was installed in the path, so don't need to put a directory to lead to it. As far as I'm aware, the preinstalled packages are Trinity, Repeatmasker (and I assume by extension, TRF), blast (as blast+, but can still call blastn and blastp), java/1.8.0_212, python3, perl, R

I'm not sure about GNU parallel but pretty certain it's not loaded.

Hopefully this is helpful, a docker version does sound cool!

Thanks so much for the help and the reply

Regards

Tom


From: Clément Goubert notifications@github.com Sent: 01 April 2020 15:01 To: clemgoub/dnaPipeTE Cc: Hill, Tom; Author Subject: Re: [clemgoub/dnaPipeTE] Issues installing on a cluster/HPC (#35)

Hi Tom,

Thanks a lot for your message. I think we can tweak the code so it runs on your cluster. I need to look in the code again because there is a parameters file that override the paths when dnaPipeTE start. Can you send me the list of programs that you need to load as well as the command you use on your cluster to load them?

Thanks,

Clément

PS: a docker version of dnaPipeTE is in the works, I hope before the end of the year. This would make dnaPipeTE portable to virtually any machine.

- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fclemgoub%2FdnaPipeTE%2Fissues%2F35%23issuecomment-607234882&data=02%7C01%7Ctom.hill%40ku.edu%7C56bc9b2bf154498afcc708d7d63cc460%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C1%7C637213428790973186&sdata=c01RE%2BOXciI1cFgqI3GRL1OjIdHMCEaM8hinlFMySAs%3D&reserved=0, or unsubscribehttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFC4DDWLWG5J5TFI6S5SS4DRKM3JZANCNFSM4LYCJY5A&data=02%7C01%7Ctom.hill%40ku.edu%7C56bc9b2bf154498afcc708d7d63cc460%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C1%7C637213428790973186&sdata=feJ%2F9rjd7CRhG4Fu8x3Mo68B6kqjjTnDuJezogU%2BASg%3D&reserved=0.

clemgoub commented 4 years ago

Hey Tom,

I understand. So, if you load the modules in your path, you can modify the file congif.ini. It contains the defaults arguments for dnaPipeTE and the path to the programs. It is generated automatically if none is present and is overridden by the arguments passed to dnaPipeTE:

[DEFAULT]
repeatmasker = bin/RepeatMasker/RepeatMasker
trinity_glue = 1
blast_folder = bin/ncbi-blast-2.2.28+/bin/
sample_number = 2
sample_size = 500000
trinity_memory = 10G
trf = bin/trf
trinity = bin/trinityrnaseq-Trinity-v2.4.0/Trinity
repeatmasker_library =
parallel = bin/parallel
rm_species = All

So I guess if you change the path to only the binary/program names as called in your path, this should do the trick! Please let me know if other issues show up.

Cheers,

Clément