gengit / PosiGene

Other
23 stars 7 forks source link

An error has occured during execution... #3

Open nancydongxyz opened 6 years ago

nancydongxyz commented 6 years ago

Hello!

I am trying to run a PosiGene analysis using this command:

perl PosiGene.pl -o=test_run_new -tn=8  -as=AC   -rs=AC:./Mollusks/AC_subset_renamed.fasta  -ts=LS:./Mollusks/LS_subset_renamed.fasta  -nhsbr=BG:./Mollusks/BG_subset_renamed.fasta,AC:./Mollusks/AC_subset_renamed.fasta,CG:./Mollusks/CG_subset_renamed.fasta,LG:./Mollusks/LG_subset_renamed.fasta,LS:./Mollusks/LS_subset_renamed.fasta,NV:./Mollusks/NV_subset_renamed.fasta -min_ident=20

I encountered this error:

Step 4/7, 2/8 threads returned Step 4/7, 3/8 threads returned Step 4/7, processing gene 36/47: Locus_AC_38... Step 4/7, processing gene 37/47: Locus_AC_21... Step 4/7, processing gene 38/47: Locus_AC_1... Step 4/7, processing gene 39/47: Locus_AC_28... Step 4/7, processing gene 40/47: Locus_AC_22... Step 4/7, processing gene 41/47: Locus_AC_11... Step 4/7, processing gene 42/47: Locus_AC_44... Step 4/7, processing gene 43/47: Locus_AC_29... Step 4/7, processing gene 44/47: Locus_AC_18... Step 4/7, processing gene 45/47: Locus_AC_34... Step 4/7, processing gene 46/47: Locus_AC_46... Step 4/7, processing gene 47/47: Locus_AC_7... Step 4/7, 4/8 threads returned Step 4/7, 5/8 threads returned Step 4/7, 6/8 threads returned Step 4/7, 7/8 threads returned Step 4/7, 8/8 threads returned Step 4/7, creating concatenated alignment... An error has occured during execution...

I don't see anything that jumps out at me in the log files. Can you please let me know what could be the problem?

By the way, I was able to run the test command in the User Manual without problem.

Thank you very much!

Nancy

eroycroft commented 6 years ago

Hi there,

I get the same error at the same point in the analysis. Did you find a solution Nancy?

Cheers, Emily

nancydongxyz commented 6 years ago

Hello Emily,

Yes, I did resolve the issue. It turns out that the subset of CDS that I used did not contain sufficient number of orthologs between species. When I used the full CDS set, as downloaded from NCBI Nucleotide (following the instructions in the PosiGene user’s manual), I was able to run the job successfully.

Hope this helps!

Nancy

From: Emily Roycroft Sent: March 6, 2018 7:09 AM To: gengit/PosiGene Cc: nancydongxyz; Author Subject: Re: [gengit/PosiGene] An error has occured during execution... (#3)

Hi there, I get the same error at the same point in the analysis. Did you find a solution Nancy? Cheers, Emily — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Mcroze commented 6 years ago

Hello,

I have the same problem. I got this message when I am trying to analyze the data test given with PosiGene: "Step 4/7, creating concatenated alignment... An error has occured during execution..."

Do you know what could be the reason? Do you have any solution? Thanks. Cheers, Myriam

asahm commented 6 years ago

Dear Myriam,

Sorry for the inconveniences. There is probably a problem with the input files.

One thing you can check easily is whether your sequence headers have a length of > 25 characters. If so you should shorten them. If renaming all sequences of your input files is difficult for you, you can send me one of those files (or a sample) and how you want them to be shortened so that I can write a little command for you.

If that is not the problem, I can offer to investigate the issue so that I can provide a solution for you. For this I would need the following:

cd [result output folder] mkdir sample for a in $(ls individual_results | head -n 20); do cp -r individual_results/$a sample/; done

If you do not want your files to be publicly available here at github, just write me a mail with the things mentioned above.

Best regards Arne

Mcroze commented 6 years ago

Dear Arne,

Thanks for your answer. Indeed, I have this problem with the data test. I put the line that you give in the README.txt: perl PosiGene.pl -o=test -as=Harpegnathos_saltator -tn=10 -rs=Acromyrmex_echinatior:test_data/Acromyrmex_echinatior_sample.fasta -nhsbr=Acromyrmex_echinatior:test_data/Acromyrmex_echinatior_sample.fasta,Atta_cephalotes:test_data/Atta_cephalotes_sample.fasta,Camponotus_floridanus:test_data/Camponotus_floridanus_sample.fasta,Harpegnathos_saltator:test_data /Harpegnathos_saltator_sample.fasta,Linepithema_humile:test_data/Linepithema_humile_sample.fasta,Pogonomyrmex_barbatus:test_data/Pogonomyrmex_barbatus_sample.fasta,Solenopsis_invicta:test_data/Solenopsis_invicta_sample.fasta

Here some example of output file that I got:

add_a_public_non_homologene_species_to_reference_Solenopsis_invicta_sample.fasta.log

Concerning the files (e.g: Solenopsis_invicta_sample.fasta.translation_VS_Acromyrmex_echinatior_sample.fasta.translation). They are empty (0kb). Otherwise the file such as Solenopsis_invicta_sample.fasta.translation, they have been built. Here an example of what I got for this file: ">SI2.2.0_06406 Solenopsis_invicta(SI2.2.0_06406, predicted mRNA) MNEANELPPGTASLLEELEKKLMVLLRDGRTLIGYLKSVDQFANIVLQSTIERIHVGQEY GDIPRGIFIVRGENVVLLGEIDIEKEKVLPLKKVTVDEILDAQRREQESKQEQKKRINKA LKERGFAYIPDLSHDDMY

SI2.2.0_10993 Solenopsis_invicta(SI2.2.0_10993, predicted mRNA) MRLYLVLMTIICLLWDRNERLALANKSAEPQPEYRPDGPLVMCKFLPKEFVECEDPVDHK GNKTAKEETGFGCVKFGGSRYEDVEKTKVSCTVLPDIECFGPRTFFREGIPCIKYSDHYF ATTLLYSILLGFLGMDRFCLGQTGTAVGKLLTLGGMGVWWIVDVILLVTNSLQPEDGSNW NPY SI2.2.0_07567 Solenopsis_invicta(SI2.2.0_07567, predicted mRNA) MDTESGNNDSGISSSISDEPCNSSKPMLPKHGDITQSKSHGKQAGCVRKMSIMFEEDDDV SDDADSDVMSIHHQHQKRSEMEAEWSSEEEKGRLLNAAREILVVPPDNGNNDRRVISRED TPESVRRDKGHRWRPQPRLMVDQTESASSSDRDNGLHSPGRSSTRSIDPRHCYHHHHHHC PNVRSLKAFNMDENQQHCSCCHQLSPTWSSALYNGSQARSFPDTVSIRSLTSIGLGSSDG"

If you need more information or files, let me know.

Best, Myriam

asahm commented 6 years ago

Dear Myriam,

Sorry, I overlooked the two decisive words "data test". My above answer is still valid for all people that successfully run the test case and have problems at this step with their own data.

To be honest I cannot remember any case in which the test run didn't work and trying to identify the error on a remote system is generally quite hard. Thanks to your own good analysis, however, we already can almost be certain that the problem is connected to the BLAST step since the file Solenopsis_invicta_sample.fasta.translation is as it should be but not Solenopsis_invicta_sample.fasta.translation_VS_Acromyrmex_echinatior_sample.fasta.translation.

The first thing that may have gone wrong is the creation of the blast database. Do the following files exist in your ortholog_assignment folder with a size of > 0kb?

Solenopsis_invicta_sample.fasta.translation.phr
Solenopsis_invicta_sample.fasta.translation.pin
Solenopsis_invicta_sample.fasta.translation.psq

If not, does the following command create these files?

[path to PosiGene/bin/makeblastdb] -in [path to PosiGene/test/ortholog_assignment/Solenopsis_invicta_sample.fasta.translation] -dbtype protein

If the problem is not the Blast database, please, download the following file to the PosiGene/bin folder...

test_BlastP.pl

...execute the following command...

perl [path to PosiGene/bin/test_BlastP.pl] [path to PosiGene/test/ortholog_assignment/Solenopsis_invicta_sample.fasta.translation] [path to PosiGene/test/ortholog_assignment/Acromyrmex_echinatior_sample.fasta.translation] Solenopsis_invicta_sample.fasta.translation_VS_Acromyrmex_echinatior_sample.fasta.translation 10 test_

...and tell me what happens. What - if everything would be normal - should happen is that a filled Solenopsis_invicta_sample.fasta.translation_VS_Acromyrmex_echinatior_sample.fasta.translation file is created in your current working directory and the following written to the console:

1 sequences read...
41 sequences read.
Write output files...
Complete.
test_created 1/10 BLAST-threads
test_created 2/10 BLAST-threads
test_created 3/10 BLAST-threads
test_created 4/10 BLAST-threads
test_created 5/10 BLAST-threads
test_created 6/10 BLAST-threads
test_created 7/10 BLAST-threads
test_created 8/10 BLAST-threads
test_created 9/10 BLAST-threads
test_created 10/10 BLAST-threads
test_1/10 BLAST-threads returned
test_2/10 BLAST-threads returned
test_3/10 BLAST-threads returned
test_4/10 BLAST-threads returned
test_5/10 BLAST-threads returned
test_6/10 BLAST-threads returned
test_7/10 BLAST-threads returned
test_8/10 BLAST-threads returned
test_9/10 BLAST-threads returned
test_10/10 BLAST-threads returned
Delete temporary files...
Mcroze commented 6 years ago

Dear Arne,

Thanks a lot for your help. I reinstall PosiGene on my computer and I got the files:

Solenopsis_invicta_sample.fasta.translation.phr (5.3ko) Solenopsis_invicta_sample.fasta.translation.pin (472 octets) Solenopsis_invicta_sample.fasta.translation.psq (20.6 ko) Solenopsis_invicta_sample.fasta.translation_VS_Acromyrmex_echinatior_sample.fasta.translation (4.7 ko)

I executed the file test_BlastP.pl https://gist.github.com/asahm/1e5dad50e1cca17c3e31e604c56f3d1a#file-test_blastp-pl and I got this:

1 sequences read... 41 sequences read. Write output files... Complete. test_created 1/10 BLAST-threads test_created 2/10 BLAST-threads test_created 3/10 BLAST-threads test_created 4/10 BLAST-threads test_created 5/10 BLAST-threads test_created 6/10 BLAST-threads test_created 7/10 BLAST-threads test_created 8/10 BLAST-threads test_created 9/10 BLAST-threads test_created 10/10 BLAST-threads test_1/10 BLAST-threads returned test_2/10 BLAST-threads returned test_3/10 BLAST-threads returned test_4/10 BLAST-threads returned test_5/10 BLAST-threads returned test_6/10 BLAST-threads returned test_7/10 BLAST-threads returned test_8/10 BLAST-threads returned test_9/10 BLAST-threads returned test_10/10 BLAST-threads returned Delete temporary files...

The file Solenopsis_invicta_sample.fasta.translation_VS_Acromyrmex_echinatior_sample has been created.

I ran again the command: perl PosiGene.pl -o=test1 -as=Harpegnathos_saltator -tn=10 -rs=Acromyrmex_echinatior:test_data/Acromyrmex_echinatior_sample.fasta -nhsbr=Acromyrmex_echinatior:test_data/Acromyrmex_echinatior_sample.fasta,Atta_cephalotes:test_data/Atta_cephalotes_sample.fasta,Camponotus_floridanus:test_data/Camponotus_floridanus_sample.fasta,Harpegnathos_saltator:test_data/Harpegnathos_saltator_sample.fasta,Linepithema_humile:test_data/Linepithema_humile_sample.fasta,Pogonomyrmex_barbatus:test_data/Pogonomyrmex_barbatus_sample.fasta,Solenopsis_invicta:test_data/Solenopsis_invicta_sample.fasta

and I get the same error:

Step 4/7, processing gene 48/48: Hsal_01748... Step 4/7, 2/10 threads returned Step 4/7, 3/10 threads returned Step 4/7, 4/10 threads returned Step 4/7, 5/10 threads returned Step 4/7, 6/10 threads returned Step 4/7, 7/10 threads returned Step 4/7, 8/10 threads returned Step 4/7, 9/10 threads returned Step 4/7, 10/10 threads returned Step 4/7, creating concatenated alignment... An error has occured during execution...

Best, Myriam

2018-04-09 20:25 GMT+02:00 asahm notifications@github.com:

Closed #3 https://github.com/gengit/PosiGene/issues/3.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gengit/PosiGene/issues/3#event-1564489987, or mute the thread https://github.com/notifications/unsubscribe-auth/AFI-B2PMRtNQSyj5jnscGG45WGw6r26mks5tm6eNgaJpZM4R85Td .

-- Myriam Croze Post-doctorante

UMR 5288 - AMIS Université Paul Sabatier/CNRS Faculté de Médecine Purpan 37 allées Jules Guesde 31073 Toulouse-France

Email: myriam.croze07@gmail.com

asahm commented 6 years ago

Dear Myriam,

What I learned from your last comment is that the behavior of the program is random on your machine at least with regard to whether the file Solenopsis_invicta_sample.fasta.translation_VS_Acromyrmex_echinatior_sample.fasta.translation is filled or not. I assume that in your last run other of the .translationVS.translation files were empty instead of Solenopsis_invicta_sample.fasta.translation_VS_Acromyrmex_echinatior_sample.

At the moment the only thing that I can imagine to cause somehow random behavior on your machine is parallelization. You can try to run the test command with t=3 or t=1 and see whether the program runs through under these conditions. I say this because there is an report describing program crashes if too many threads are used for analyzing very small data sets or on machines with a low number of CPUs.

Otherwise I can offer you only to send me all files that were created by the test command. I am sorry to say, however, that I am - to be honest - quite skeptical whether I can find the answer why the program shows random behavior on your system without having access to it.

Best regards and sorry for the disappointment Arne

Mcroze commented 6 years ago

Dear Arne,

I tried the command with t=1 and I got the same error. I also tried to run PosiGene on the server on my university and I got the same error. The file "Solenopsis_invicta_sample.fasta.translation_VS_Acromyrmex_echinatior_sample" has been created and is not empty.

Is it possible to only do the mode "positive selection"? Indeed, I already have the alignements of all the orthologs sequences and I also have the tree for the species that I am interested in. I just want to look if my genes are under selection and if yes on which branches.

Thanks again. Best,

Myriam

2018-04-10 20:50 GMT+02:00 asahm notifications@github.com:

Dear Myriam,

What I learned from your last comment is that the behavior of the program is random on your machine at least with regard to whether the file Solenopsis_invicta_sample.fasta.translationVS Acromyrmex_echinatior_sample.fasta.translation is filled or not. I assume that in your last run other of the .translationVS.translation files were empty instead of Solenopsis_invicta_sample.fasta.translationVS Acromyrmex_echinatior_sample.

At the moment the only thing that I can imagine to cause somehow random behavior on your machine is parallelization. You can try to run the test command with t=3 or t=1 and see whether the program runs through under these conditions. I say this because there is an report describing program crashes if too many threads are used for analyzing very small data sets or on machines with a low number of CPUs.

Otherwise I can offer you only to send me all files that were created by the test command. I am sorry to say, however, that I am - to be honest - quite skeptical whether I can find the answer why the program shows random behavior on your system without having access to it.

Best regards and sorry for the disappointment Arne

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gengit/PosiGene/issues/3#issuecomment-380208502, or mute the thread https://github.com/notifications/unsubscribe-auth/AFI-B-NHejKbS7tWCydWaW7zOVQ2H3DBks5tnP8CgaJpZM4R85Td .

-- Myriam Croze Post-doctorante

UMR 5288 - AMIS Université Paul Sabatier/CNRS Faculté de Médecine Purpan 37 allées Jules Guesde 31073 Toulouse-France

Email: myriam.croze07@gmail.com

asahm commented 6 years ago

Dear Myriam,

Is it possible to only do the mode "positive selection"?

For this you had to reproduce exactly the file structure that the program expects.

Anyway, I found a way to reproduce exactly the error pattern you described on my own machines (just without randomness). Try simply...

chmod 744 -R [PosiGene directory]

...and then use the test command again. I assume it will work, then.

Explanation:

When I tested my own program I downloaded both the .zip file and the .tar.gz file from github, unpacked them with "unzip PosiGene-0.1.zip" and "tar -xzf PosiGene-0.1.tar.gz", respectively, and executed the test case successfully. I now found out, however, that some graphical unpacking programs do not restore the original unix file permissions even if they are stored in the package (maybe also using those commands does not always result in correct restoration of file permissions, I have to check this). If the execution flags on the executables are missing, the program crashes at the step you mentioned and produces empty .translationVS.translation files.

I really hope that this was the problem and will update the README accordingly.

Best regards and sorry for these inconveniences Arne

Mcroze commented 6 years ago

Dear Arne,

Thanks for the help! It works finally. Indeed I tried to unzip the file on another computer and then it worked. I guess the problem is coming from my computer.

Best, Myriam

2018-04-11 19:54 GMT+02:00 asahm notifications@github.com:

Dear Myriam,

Is it possible to only do the mode "positive selection"?

For this you had to reproduce exactly the file structure that the program expects.

Anyway, I found a way to reproduce exactly the error pattern you described on my own machines (just without randomness). Try simply...

chmod 744 -R [PosiGene directory]

...and then use the test command again. I assume it will work, then.

Explanation:

When I tested my own program I downloaded both the .zip file and the .tar.gz file from github, unpacked them with "unzip PosiGene-0.1.zip" and "tar -xzf PosiGene-0.1.tar.gz", respectively, and executed the test case successfully. I now found out, however, that some graphical unpacking programs do not restore the original unix file permissions even if they are stored in the package (maybe also using those commands does not always result in correct restoration of file permissions, I have to check this). If the execution flags on the executables are missing, the program crashes at the step you mentioned and produces empty .translationVS.translation files.

I really hope that this was the problem and will update the README accordingly.

Best regards and sorry for these inconveniences Arne

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gengit/PosiGene/issues/3#issuecomment-380540501, or mute the thread https://github.com/notifications/unsubscribe-auth/AFI-BwlI20kdPX2jgAbiEDvtjRpTzAcoks5tnkNSgaJpZM4R85Td .

-- Myriam Croze Post-doctorante

UMR 5288 - AMIS Université Paul Sabatier/CNRS Faculté de Médecine Purpan 37 allées Jules Guesde 31073 Toulouse-France

Email: myriam.croze07@gmail.com

bblarsen commented 6 years ago

I am getting this error on any computer I try with the test data. It does not appear to be fixed, any ideas?

Alopias1988 commented 6 years ago

Hi Arne, I am having the same issue as Myriam where this happens when I run the test code:

Step 4/7, creating concatenated alignment... An error has occured during execution...

I tried this code:

chmod 744 -R [PosiGene directory]

but not sure what the -R stands for? it doesn't work when I add it

Thanks!

FatihSarigol commented 6 years ago

Test run was working fine for me but "Symbol based ortholog assignment" on my samples failed with the same error at the end of the 4th phase. I tried several things plus all the possible solutions here and I noticed that when gene names are the same in my different samples (such as >Gene1 exists in all), it fails, but when I add arbitrary "isoform" names different and unique for each sample to the beginning of each gene name (such as >Sample1|Gene1 >Sample2|Gene1), it works.

Also in my small test file I noticed translations of each sample's genes were exactly the same, that causes a fail during the 5th phase, but in my full set of course that didn't happen. One other thing, when I run Posigene without Libgd, naturally it gives a warning and doesn't generate any png files, but when I run it with Libgd it doesn't give that warning and generates png files but they are all empty; that's probably because of a connection issue between the 2 programs I cannot find a way to establish, but anyway not a big problem, and thanks for Posigene!

jessstapley commented 4 years ago

Hi Arne Thanks for developing and maintaing Posigene. I am having a similar problem. I have tried to run posigene and I get the following error:

Step 4/7, 10/10 threads returned Step 4/7, creating concatenated alignment... An error has occured during execution... Try to run the program again and use the parameter "-continue" to start again from the last valid point of execution...

I have uploaded my fasta files (for one chr) and my code to https://github.com/jessstapley/test_posigene I hope I have provided you with enough information.

I have checked that posigene is working properly and when I use the test data is ran without error and produced the results files. I have checked and my gene names are less than 25 characters. My gene names are the same across individuals and I have tried running again with edited gene names (in each sample.fasta all gene names are different), but I still get the same error.

Can you please help me to figure out what I am doing wrong?

Thanks Jessica

bbalog87 commented 4 years ago

Dear Arne, I am getting a similar issue while running Posigen with gbk files. The command that's throwing this error is the following:

perl PosiGene.pl -ts=PFLU,PFLA,SLUC,SVIT,ESPE,ECRA \
    -hs=SLUC.gbk,ESPE.gb,ECRA.gb,PFLU.gb,PFLA.gb,SVIT.gbk \
        -nhs=ELAN:ELAN.cds.fa,CGOB:CGOB.cds.fa,LCAL:LCAL.cds.fa\
    -as=PFLA -tn=80

My aim is to test Positive selection in the LCA of these six species which all belong to the same taxonomic family: PFLU,PFLA,SLUC,SVIT,ESPE,ECRA

The crash error message is :

Step 1/7, parse HomoloGene...
Step 1/7 (species-file 1/6), read sequences from /projekte/I3-PikeperchAssembly/PosiGene/PosiGene/ESPE.gb...
Step 1/7 (species-file 1/6), read 500 sequences from /projekte/I3-PikeperchAssembly/PosiGene/PosiGene/ESPE.gb
An error has occured during execution...
Try to run the program again and use the parameter "-continue" to start again from the last valid point of execution...

It's the same issue as mentioned above, no further information or file was generated.

Can you please track what's going wrong?

Best, Julien

asahm commented 4 years ago

Dear Jessica (and all with similar problems),

The most common cause for this error is that PosiGene is a bit sensitive to the naming of sequences. If this occurs, try to name the genes/sequences uniquely and without special characters.

In your case, this will almost certainly cause the crash. The sequence IDs in yours look like this: 1:6191715-6191916. I suggest to replace the colon and the minus at least with an underscore. The safest way would be to number the sequences and connect them with a species lD, e.g. 1_Efcds, 2_Efcds, ....

You should also take another look at your sequence creation. In the fasta-files you will find characters that should not be there ("--") and extremely short coding sequences (<10 nuckleotides).

Best regards and thanks for uploading your files right away Arne

asahm commented 4 years ago

Dear Julien,

-hs=SLUC.gbk,ESPE.gb,ECRA.gb,PFLU.gb,PFLA.gb,SVIT.gbk

These species must be be species that are part of the homologene data base (-homologene_species|-hs). If you don't have a species included in this database you should use the reference species based ortholog assignment system instead - as in the ant example included in the readme. You could modify your commands, e.g., like this:

perl PosiGene.pl -ts=PFLU,PFLA,SLUC,SVIT,ESPE,ECRA -rs=SLUC.gbk -nhsbr=SLUC.gbk,ESPE.gb,ECRA.gb,PFLU.gb,PFLA.gb,SVIT.gbk,ELAN:ELAN.cds.fa,CGOB:CGOB.cds.fa,LCAL:LCAL.cds.fa -as=PFLA -tn=80

Best, Arne

kerrygendreau commented 3 years ago

My analysis was also crashing at this step:

Step 4/7, creating concatenated alignment... An error has occured during execution...

I resolved the problem by changing my anchor species. Both the old and new anchor species have well-annotated genomes, so I am not sure what the exact problem was, but this is a possible solution if anyone else gets stuck here.

hdowney712 commented 3 years ago

Hi Arne:

My analysis has been erroring out at step 4 (when creating the concatenated alignment). I tried modifying a few things but step 4 is still the furthest the program has been able to run.

The potential naming issue isn't the case for me as each locus tag does not have special characters and is under the 25 character limit. I also switched from using .gbk files to .fasta, though I ended up with the same error for both of them.

I am using a prokaryotic species and the genome sequences contain pseudogenes... Could that be a problem, or is it something else that I am overlooking?

Thank you, Hwaylee

asahm commented 3 years ago

Dear PosiGene users,

Another problem that often leads to the mentioned error is that for the anchor_species a file path is specified. Please use the species name for anchor_species (as well as for target_species). From the help:

-anchor_species|-as -> The NAME of your chosen anchor species. Must be part of
the passed species set.

-target_species|ts -> Comma separated NAME list of your chosen target species.
Must be part of the passed species set.

Best, Arne