eead-csic-compbio / get_homologues

GET_HOMOLOGUES: a versatile software package for pan-genome analysis
Other
110 stars 26 forks source link

files in tmp are not re-used #74

Closed anacristinareis closed 3 years ago

anacristinareis commented 3 years ago

Hi,

I'm working on pangenome analysis of Mycobacterium species. After the identification of orthologs between all the 70 genomes and the identification of inparalogs, the analysis progress is being slowly.

find_OMCL_clusters: parsing clusters (/Users.../tmp/all_ortho.mcl).

Splitting clusters by Pfam domain composition

Split Pfam clusters

Sample 0 (1317.gbk)

And, then, it perform all vs all homologs comparisons, using the 70 genomes.

However, when another sample is added, it performs again the comparison between genomes, does not re-use the files generated in the previous step.

Thanks for all your help.

brunocontrerasmoreira commented 3 years ago

Olá @anacristinareis , can you please confirm you are running get_homologues.pl with param -c ? Are you also using -m cluster? In that case it needs to re-compute homologues, to tell core- from pan-genes, every time a new genome is added. Because these comparison depend on the order, many will need to be computed the first time they're needed, and then will be re-used if encountered again. If you are not using -m cluster then I strongly recommend you combine -m dryrun with GNU parallel as explained in http://eead-csic-compbio.github.io/get_homologues/manual/manual.html#dryrun and on a recent thread (https://github.com/eead-csic-compbio/get_homologues/issues/72), that would run those operation in parallel as much as possible, hope this helps, Bruno

anacristinareis commented 3 years ago

Hello,

I'm using this code: "./get_homologues.pl -d Samples -M -D -t0 -c ".

Can I stop my analysis, because it is still running, and run "./ get_homologues.pl -d Samples -M -D -t0 -c-m dryrun".

Thanks for your help. Ana Reis

brunocontrerasmoreira @.***> escreveu no dia sexta, 21/05/2021 à(s) 15:42:

Olá @anacristinareis https://github.com/anacristinareis , can you please confirm you are running get_homologues.pl with param -c ? Are you also using -m cluster? In that case it needs to re-compute homologues, to tell core- from pan-genes, every time a new genome is added. Because these comparison depend on the order, many will need to be computed the first time they're needed, and then will be re-used if encountered again. If you are not using -m cluster then I strongly recommend you combine -m dryrun with GNU parallel as explained in http://eead-csic-compbio.github.io/get_homologues/manual/manual.html#dryrun and on a recent thread (#72 https://github.com/eead-csic-compbio/get_homologues/issues/72), that would run those operation in parallel as much as possible, hope this helps, Bruno

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/74#issuecomment-846001009, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYUHQXMF55XRMUYQE4OCSTTOZWOLANCNFSM45I2DJYA .

brunocontrerasmoreira commented 3 years ago

Hi, you can stop that process, sure. Then please run $ ls -ltr Samples_homologues/tmp | tail and share the output here

anacristinareis commented 3 years ago

Hi,

Only to confirm, the only script I need to write is " ls -ltr Samples_homologues/tmp | tail".

Sorry, for my question.

Thanks, Ana Reis

brunocontrerasmoreira @.***> escreveu no dia sexta, 21/05/2021 à(s) 16:19:

Hi, you can stop that process, sure. Then please run $ ls -ltr Samples_homologues/tmp | tail and share the output here

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/74#issuecomment-846028086, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYUHQRF3MPB3X5V32BALD3TOZ2YJANCNFSM45I2DJYA .

brunocontrerasmoreira commented 3 years ago

Yes, after killing the get_homs process. That ls command will allow you to see the size and timestamp of the homologos saved results

anacristinareis commented 3 years ago

Hi,

Here is the result output:

(base) MBP-de-Ana:get_homologues anareis$ ls -ltr Mbovis_homologues/tmp | tail

-rw-r--r-- 1 anareis staff 75683 21 Mai 17:39 homologues_2397.gbk_Mb1841.gbk

-rw-r--r-- 1 anareis staff 75449 21 Mai 17:39 homologues_2397.gbk_SRR1791984.gbk

-rw-r--r-- 1 anareis staff 69401 21 Mai 17:39 homologues_2397.gbk_Reference.gb

-rw-r--r-- 1 anareis staff 71547 21 Mai 17:39 homologues_2397.gbk_601.gbk

-rw-r--r-- 1 anareis staff 75179 21 Mai 17:39 homologues_2397.gbk_Mb1712.gbk

-rw-r--r-- 1 anareis staff 71020 21 Mai 17:39 homologues_2397.gbk_ERR1203064.gbk

-rw-r--r-- 1 anareis staff 71241 21 Mai 17:39 homologues_2397.gbk_1785.gbk

-rw-r--r-- 1 anareis staff 74909 21 Mai 17:39 homologues_2397.gbk_Mb565.gbk

-rw-r--r-- 1 anareis staff 75503 21 Mai 17:40 homologues_2397.gbk_1339.gbk

-rw-r--r-- 1 anareis staff 75107 21 Mai 17:40 homologues_2397.gbk_NZ_1_Canada.gbk

Thanks, Ana Reis

brunocontrerasmoreira @.***> escreveu no dia sexta, 21/05/2021 à(s) 17:28:

Yes, after killing the get_homs process. That ls command will allow you to see the size and timestamp of the homologos saved results

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/74#issuecomment-846083885, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYUHQTINN6EDJIQEEOUYYDTO2CZBANCNFSM45I2DJYA .

brunocontrerasmoreira commented 3 years ago

It all looks good, see that files have a similar size? 1) Check if you have parallel installed by typing it in the terminal, else install it 2) You can now rerun adding -m dryrun , passing the batch file to parallel as explained in the manual

anacristinareis commented 3 years ago

Ok, thanks.

Just to check, after install parallel, the code is: "./get_homologues.pl -d Samples -M -D -t0 -c-m dryrun", or I don't need to add -M -D -t0 -c flags?

Thanks for all your help.

Ana Reis

brunocontrerasmoreira @.***> escreveu no dia sexta, 21/05/2021 à(s) 18:36:

It all looks good, see that files have a similar size?

  1. Check if you have parallel installed by typing it in the terminal, else install it
  2. You can now rerun adding -m dryrun , passing the batch file to parallel as explained in the manual

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/74#issuecomment-846125349, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYUHQQJGTIHGCDMDRJ3IZ3TO2KZFANCNFSM45I2DJYA .

brunocontrerasmoreira commented 3 years ago

./get_homologues.pl -d Samples -M -D -t0 -c -m dryrun

anacristinareis commented 3 years ago

Thank you very much for your time and help.

Best regards, Ana Reis

brunocontrerasmoreira @.***> escreveu no dia sexta, 21/05/2021 à(s) 19:41:

./get_homologues.pl -d Samples -M -D -t0 -c -m dryrun

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/74#issuecomment-846159549, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYUHQTL6GRNE7WY33FYRU3TO2SMVANCNFSM45I2DJYA .