carnegie / PlantClusterFinder

GNU General Public License v3.0
15 stars 1 forks source link

Verbose2 #6

Closed Frybank closed 1 year ago

Frybank commented 1 year ago

I'm very sorry, I don't know why the answer to the question suddenly shut down。 After adding the "Verbose2" parameter, the following error message was displayed, and then according to your instructions, I checked the GAP_output file, and did generate the GAP_output file as the example file. I noticed that the number of "N" in the temp file of the genome was very small, only 87. However, the number of "N" in the test data is very large, as high as 19,000, I wonder if there is a problem here? Looking forward to your reply very much Next command running: /home/wyh/miniconda3/bin/awk -f enter_new_line_characters_in_fastafile.awk '/home/wyh/software/PlantClusterFinder/c.genome.fasta'>'/home/wyh/software/PlantClusterFinder/c.genome.fastatemp1' Status was 0 Output was: Next command running: ' < '/home/wyh/software/PlantClusterFinder/c.genome.fastatemp1'>'/home/wyh/software/PlantClusterFinder/c.genome.fastatemp2' Status was 0 Output was: Next command running: tr -d ' ' < '/home/wyh/software/PlantClusterFinder/c.genome.fastatemp2'>'/home/wyh/software/PlantClusterFinder/c.genome.fasta_temp3' Status was 0 Output was: Next command running: /home/wyh/miniconda3/bin/awk -f get_new_linein.awk '/home/wyh/software/PlantClusterFinder/c.genome.fastatemp3'>'/home/wyh/software/PlantClusterFinder/c.genome.fasta_temp4' Status was 0 Output was: Next command running: /home/wyh/miniconda3/bin/awk -f get_positions_ofgap.awk '/home/wyh/software/PlantClusterFinder/c.genome.fastatemp4'>'/home/wyh/software/PlantClusterFinder/c.genome.fasta_GAPOutput' Status was 0 Output was: Error using PlantClusterFinder>f_analyze_PlantClusterGapFile Too many input arguments. Error in PlantClusterFinder>f_get_Sequencing_Gaps (line 2168) Error in PlantClusterFinder>f_annotate_Sequencing_Gaps (line 1984) Error in PlantClusterFinder (line 1124) MATLAB:TooManyInputs aec3bf6734e1b5f221d3fb66178e155

Frybank commented 1 year ago

The following values that are related to files in f_annotate_Sequencing_Gaps are: the masked dna file, and the "_GAPoutput" file generated from it. From the code itself it's likely related to the GAPoutput file, but based on your reply it's generated. To easier pinpoint the errors, can you do a clean run this time. So remove all the temporary files generated during the runs, such as the temp files and GAPOutput, plus use "Verbose 2" we have more information. Next check if this time the GAOutput is generated and the file's content is correct, e.g. does the file start with empty lines? Is it in the same 4 column structure like "CsubellipsoideaC_169_227_v2.0.hardmasked.fa_GAPOutput" etc. I captured the number of N in my GAP_output, which is more than 3000, and the test data is more than 4000. Theoretically, is this number proportional to the N in the expected temp?

bxuecarnegie commented 1 year ago

The GAPoutput is counting the number of N's from it's starting position. So for example your first line means on the chromosome chr1, starting from the 10814449th amino acids of the temp4 file, 3000 would be 'N's. Based on the output, the error is related to the '_GAPOutput_count.txt' file. Can you show me the content of your output folder and what's in the '_GAPOutput_count.txt' file?

Frybank commented 1 year ago

The GAPoutput is counting the number of N's from it's starting position. So for example your first line means on the chromosome chr1, starting from the 10814449th amino acids of the temp4 file, 3000 would be 'N's. Based on the output, the error is related to the '_GAPOutput_count.txt' file. Can you show me the content of your output folder and what's in the '_GAPOutput_count.txt' file? These are all the output files 285184a702477393b539fcdcaf7c8e3 This is the GAPoutput file https://github.com/Frybank/PGDB/blob/main/c.genome.fasta_GAPOutput

bxuecarnegie commented 1 year ago
Capture

I was able to run the f_analyze_PlantClusterGapFile directly on your file with no issue. I am not sure what will be causing the problem at this point, the only fix I can think of is change the string concatenation from using square brackets to using the strcat function. I pushed the changes and see if that fixes this issue.

Frybank commented 1 year ago
Capture

I was able to run the f_analyze_PlantClusterGapFile directly on your file with no issue. I am not sure what will be causing the problem at this point, the only fix I can think of is change the string concatenation from using square brackets to using the strcat function. I pushed the changes and see if that fixes this issue. I ran the file you modified, and this is the result Error using fgets Invalid file identifier. Use fopen to generate a valid file identifier. Error in fgetl (line 33) Error in PlantClusterFinder>f_annotate_Sequencing_Gaps (line 1990) Error in PlantClusterFinder (line 1124)、 MATLAB:FileIO:InvalidFid

Frybank commented 1 year ago
捕获

我能够直接在您的文件上运行f_analyze_PlantClusterGapFile,没有问题。我不确定此时是什么原因导致问题,我能想到的唯一解决方法是将字符串连接从使用方括号更改为使用 strcat 函数。我推送了更改,看看是否可以解决此问题。

I get this error, I did not generate GAP_output data, is it a file problem? Invalid file identifier I don't understand? Where exactly is the problem?

bxuecarnegie commented 1 year ago

What do you mean it didn't generate GAP_output data, are the other temp files also not generated? It's hard for me to pinpoint your issues as I cannot replicate errors when they are different through out your runs.

Frybank commented 1 year ago

What do you mean it didn't generate GAP_output data, are the other temp files also not generated? It's hard for me to pinpoint your issues as I cannot replicate errors when they are different through out your runs.

The runtime prompts this error and does not produce the GAP_output file

Frybank commented 1 year ago

What do you mean it didn't generate GAP_output data, are the other temp files also not generated? It's hard for me to pinpoint your issues as I cannot replicate errors when they are different through out your runs.

The runtime prompts this error and does not produce the GAP_output file

Other temporary files are generated, but there is no output file

bxuecarnegie commented 1 year ago

Then you're encounting a new error that precedes the error that we were trying to fix, can you copy more of the error output, and in the meantime run with "Verbose 2" again?

Frybank commented 1 year ago

Then you're encounting a new error that precedes the error that we were trying to fix, can you copy more of the error output, and in the meantime run with "Verbose 2" again?

ok, I add the verbose2 parameter again to run below

Frybank commented 1 year ago

Then you're encounting a new error that precedes the error that we were trying to fix, can you copy more of the error output, and in the meantime run with "Verbose 2" again? Error using PlantClusterFinder>f_analyze_PlantClusterGapFile Too many input arguments. Error in PlantClusterFinder>f_get_Sequencing_Gaps (line 2168) Error in PlantClusterFinder>f_annotate_Sequencing_Gaps (line 1984) Error in PlantClusterFinder (line 1124) MATLAB:TooManyInputs I re-run the program and get the following message, this output GAP_output data, but no gene cluster output data

bxuecarnegie commented 1 year ago

Is it okay for you to share me all your input files and your command? At this point I don't think I'll be able to pinpoint the issues if I can't replicate the error myself.

Frybank commented 1 year ago

Is it okay for you to share me all your input files and your command? At this point I don't think I'll be able to pinpoint the issues if I can't replicate the error myself. All the data? Includes pf, DNA, protein, glof,gtpf or just the output?

bxuecarnegie commented 1 year ago

All the input that you included in the command you've ran

Frybank commented 1 year ago

All the input that you included in the command you've ran

Dear author, the data is 1G size, how should I send it to you?

bxuecarnegie commented 1 year ago

Do you have google drive or any other web drives?

Frybank commented 1 year ago

I am in China, I wonder if you can use a network cloud disk similar to Baidu Web disk?

bxuecarnegie commented 1 year ago

Maybe? I think it’s accessible in the US

Frybank commented 1 year ago

Do you have google drive or any other web drives?

Maybe? I think it’s accessible in the US Baidu web disk link, I can also windows onedrive here, if you can also? 链接:https://pan.baidu.com/s/1gMuAbGZLGAvEVAaCmDazDQ 提取码:nyfo

Frybank commented 1 year ago

Maybe? I think it’s accessible in the US

OneDrive may require one of your email addresses to share data with you

bxuecarnegie commented 1 year ago

I'm missing your gff and annotation_info files. In the meantime, I've changed some file path checking in the .m file. You can give it a try and see if that fixes things. Note that this was compiled using MATLAB r2023b, so you might need to recompile it using the command "mcc -m PlantClusterFinder.m -a get_new_line_in.awk -a get_positions_of_gap.awk -a enter_new_line_characters_in_fasta_file.awk" when in the PlantClusterFinder folder in MATLAB.

Frybank commented 1 year ago

I'm missing your gff and annotation_info files. In the meantime, I've changed some file path checking in the .m file. You can give it a try and see if that fixes things. Note that this was compiled using MATLAB r2023b, so you might need to recompile it using the command "mcc -m PlantClusterFinder.m -a get_new_line_in.awk -a get_positions_of_gap.awk -a enter_new_line_characters_in_fasta_file.awk" when in the PlantClusterFinder folder in MATLAB. I configured 2023b according to your prompt, but it shows that matlab lacks library LD_LIBRARY_PATH is .:/usr/local/MATLAB/R2023b/MATLAB_Runtime/v23.2/runtime/glnxa64:/usr/local/MATLAB/R2023b/MATLAB_Runtime/v23.2/bin/glnxa64:/usr/local/MATLAB/R2023b/MATLAB_Runtime/v23.2/sys/os/glnxa64:/usr/local/MATLAB/R2023b/MATLAB_Runtime/v23.2/sys/opengl/lib/glnxa64 ./PlantClusterFinder: error while loading shared libraries: libmwlaunchermain.so: cannot open shared object file: No such file or directory I run the configuration command mcc -m PlantClusterFinder.m -a get_new_line_in.awk -a get_positions_of_gap.awk -a enter_new_line_characters_in_fasta_file.awk), no error is reported

Frybank commented 1 year ago

I'm missing your gff and annotation_info files. In the meantime, I've changed some file path checking in the .m file. You can give it a try and see if that fixes things. Note that this was compiled using MATLAB r2023b, so you might need to recompile it using the command "mcc -m PlantClusterFinder.m -a get_new_line_in.awk -a get_positions_of_gap.awk -a enter_new_line_characters_in_fasta_file.awk" when in the PlantClusterFinder folder in MATLAB. Script command to use test data ./run_PlantClusterFinder.sh /usr/local/MATLAB/R2023b/MATLAB_Runtime/v23.2 -pgdb "/home/wyh/software/PlantClusterFinder/csubellipsoidea/pgdb/csubellipsoideacyc/1.0/data" -rmdf "/home/wyh/software/PlantClusterFinder/Inputs/ReactionMetabolicDomainClassification.txt" -md "{'Amines and Polyamines Metabolism'; 'Amino Acids Metabolism'; 'Carbohydrates Metabolism'; 'Cofactors Metabolism'; 'Detoxification Metabolism'; 'Energy Metabolism'; 'Fatty Acids and Lipids Metabolism'; 'Hormones Metabolism'; 'Inorganic Nutrients Metabolism'; 'Nitrogen-Containing Compounds'; 'Nucleotides Metabolism'; 'Phenylpropanoid Derivatives'; 'Polyketides'; 'Primary-Specialized Interface Metabolism'; 'Redox Metabolism'; 'Specialized Metabolism'; 'Sugar Derivatives'; 'Terpenoids'}" -psf "/home/wyh/software/PlantClusterFinder/csubellipsoidea/CsubellipsoideaC_169_227_v2.0.protein.pcf13.fa" -gtpf "/home/wyh/software/PlantClusterFinder/csubellipsoidea/gtpf_CsubellipsoideaC_169_227_v2.0.annotation_info.txt.txt" -glof "/home/wyh/software/PlantClusterFinder/csubellipsoidea/glof_CsubellipsoideaC_169_227_v2.0.gene.gff3.txt" -dnaf "/home/wyh/software/PlantClusterFinder/csubellipsoidea/CsubellipsoideaC_169_227_v2.0.hardmasked.fa" -sitf "/home/wyh/software/PlantClusterFinder/Inputs/scaffold-tailoring-reactions-05082016.tab" -gout "/home/wyh/software/PlantClusterFinder/testGene.txt" -cout "/home/wyh/software/PlantClusterFinder/testCluster.txt" SeqGapSizesChromBreak [10000] PGDBIdsToMap GTP

bxuecarnegie commented 1 year ago

That looks like an MCR installation error or compilation issue. Are the version of MATLAB that you ran the mcc command and the path of the matlab runtime version aligned? Can you find the file 'libmwlaunchermain.so' in '/usr/local/MATLAB/R2023b/MATLAB_Runtime/v23.2/bin/glnxa64'?

Frybank commented 1 year ago

That looks like an MCR installation error or compilation issue. Are the version of MATLAB that you ran the mcc command and the path of the matlab runtime version aligned? Can you find the file 'libmwlaunchermain.so' in '/usr/local/MATLAB/R2023b/MATLAB_Runtime/v23.2/bin/glnxa64'? When I run mcrinstaller, it prompts mcrinstaller

**ans =

'/ home/wyh/MCRInstaller23.2 / apply MATLAB_Runtime_R2023b_glnxa64. Zip'**

Frybank commented 1 year ago

I have an MCRV23.2 folder after mcrinstaller and I manually installed the./install command

Frybank commented 1 year ago

That looks like an MCR installation error or compilation issue. Are the version of MATLAB that you ran the mcc command and the path of the matlab runtime version aligned? Can you find the file 'libmwlaunchermain.so' in '/usr/local/MATLAB/R2023b/MATLAB_Runtime/v23.2/bin/glnxa64'? ok, I know what the problem is, I've fixed it, it's working now, okay

Frybank commented 1 year ago

That looks like an MCR installation error or compilation issue. Are the version of MATLAB that you ran the mcc command and the path of the matlab runtime version aligned? Can you find the file 'libmwlaunchermain.so' in '/usr/local/MATLAB/R2023b/MATLAB_Runtime/v23.2/bin/glnxa64'?

An error message is displayed when using test data Too many input arguments. Error in PlantClusterFinder>f_get_Sequencing_Gaps (line 2168) Error in PlantClusterFinder>f_annotate_Sequencing_Gaps (line 1984) Error in PlantClusterFinder (line 1124) MATLAB:TooManyInputs

bxuecarnegie commented 1 year ago
Screenshot 2023-11-21 at 6 13 11 PM Screenshot 2023-11-21 at 6 11 56 PM

I was able to run the pipeline on your files after the fixes. First test if the pipeline runs on the example data, next try removing all previously generated files and start a clean run.

Frybank commented 1 year ago

Screenshot 2023-11-21 at 6 13 11 PM Screenshot 2023-11-21 at 6 11 56 PM I was able to run the pipeline on your files after the fixes. First test if the pipeline runs on the example data, next try removing all previously generated files and start a clean run. Sorry, I ran the test data, but I got the data to prompt this message here Read in gene position file Map gene location to conversion gene-IDs Find intergenic regions Calculate size of sequence gap that should be populated by hypothetical genes Identify sequencing gaps (bases encoded by N) Check awk installation. Get information gaps in genomes. Error using PlantClusterFinder>f_analyze_PlantClusterGapFile Too many input arguments. Error in PlantClusterFinder>f_get_Sequencing_Gaps (line 2168) Error in PlantClusterFinder>f_annotate_Sequencing_Gaps (line 1984) Error in PlantClusterFinder (line 1124) MATLAB:TooManyInputs

Frybank commented 1 year ago

Screenshot 2023-11-21 at 6 13 11 PM Screenshot 2023-11-21 at 6 11 56 PM I was able to run the pipeline on your files after the fixes. First test if the pipeline runs on the example data, next try removing all previously generated files and start a clean run. Sorry, I ran the test data, but I got the data to prompt this message here Read in gene position file Map gene location to conversion gene-IDs Find intergenic regions Calculate size of sequence gap that should be populated by hypothetical genes Identify sequencing gaps (bases encoded by N) Check awk installation. Get information gaps in genomes. Error using PlantClusterFinder>f_analyze_PlantClusterGapFile Too many input arguments. Error in PlantClusterFinder>f_get_Sequencing_Gaps (line 2168) Error in PlantClusterFinder>f_annotate_Sequencing_Gaps (line 1984) Error in PlantClusterFinder (line 1124) MATLAB:TooManyInputs That's my order ./run_PlantClusterFinder.sh /usr/local/MATLAB/MATLAB_Runtime/R2023b -pgdb "/home/wyh/software/PlantClusterFinder/lcucyc/1.0/data" -rmdf "/home/wyh/software/PlantClusterFinder/Inputs/ReactionMetabolicDomainClassification.txt" -md "{'Amines and Polyamines Metabolism'; 'Amino Acids Metabolism'; 'Carbohydrates Metabolism'; 'Cofactors Metabolism'; 'Detoxification Metabolism'; 'Energy Metabolism'; 'Fatty Acids and Lipids Metabolism'; 'Hormones Metabolism'; 'Inorganic Nutrients Metabolism'; 'Nitrogen-Containing Compounds'; 'Nucleotides Metabolism'; 'Phenylpropanoid Derivatives'; 'Polyketides'; 'Primary-Specialized Interface Metabolism'; 'Redox Metabolism'; 'Specialized Metabolism'; 'Sugar Derivatives'; 'Terpenoids'}" -psf "/home/wyh/software/PlantClusterFinder/Lcuevm.pep.fasta" -gtpf "/home/wyh/software/PlantClusterFinder/gtpfx.txt" -glof "/home/wyh/software/PlantClusterFinder/glofx.txt" -dnaf "/home/wyh/software/PlantClusterFinder/c.genome.fasta" -sitf "/home/wyh/software/PlantClusterFinder/Inputs/scaffold-tailoring-reactions-05082016.tab" -gout "/home/wyh/software/PlantClusterFinder/LcuGene.txt" -cout "/home/wyh/software/PlantClusterFinder/LcuCluster.txt" SeqGapSizesChromBreak [10000] PGDBIdsToMap GTP Verbose 1

Frybank commented 1 year ago

Screenshot 2023-11-21 at 6 13 11 PM Screenshot 2023-11-21 at 6 11 56 PM I was able to run the pipeline on your files after the fixes. First test if the pipeline runs on the example data, next try removing all previously generated files and start a clean run. Are the double quotes of [10000] required? When I used the test data, I printed the result without double quotes

bxuecarnegie commented 1 year ago

No, that shouldn't be the problem. Can you check if the _temp & GAPOutput files are empty? If that's the case, it could be that gnu awk isn't installed.

Frybank commented 1 year ago

No, that shouldn't be the problem. Can you check if the _temp & GAPOutput files are empty? If that's the case, it could be that gnu awk isn't installed. I have temp and GAP_output with output, not empty, and my awk --version displays GNU Awk 5.2.2, API 3.2, PMA Avon 8-g1, (GNU MPFR 4.2.0, GNU MP 6.2.1) Copyright (C) 1989, 1991-2023 Free Software Foundation. That doesn't seem to be the problem. Is it a systemic problem? My server is 22.04 ubuntu

bxuecarnegie commented 1 year ago

There was a file check error that doesn't detect the next output of GAP_output. I've pushed a new version to the repository.

Frybank commented 1 year ago

There was a file check error that doesn't detect the next output of GAP_output. I've pushed a new version to the repository. I used the new version you provided, but the test data could not be run Error using PlantClusterFinder>f_analyze_PlantClusterGapFile Too many input arguments. Error in PlantClusterFinder>f_get_Sequencing_Gaps (line 2168) Error in PlantClusterFinder>f_annotate_Sequencing_Gaps (line 1984) Error in PlantClusterFinder (line 1124)

bxuecarnegie commented 1 year ago
  1. Did you recompile the script from the repository not the “release” compressed file.
  2. Did you remove previous temporary files
  3. Were the temp_ file and GAPOutput files generated and not empty
  4. Were the GAPoutput count file generated and not empty
  5. Run it with verbose 2

At this point unfortunately I’m running out of options due to me being unable to replicate the errors you’re encountering. You can try out other branches that use different methods of generating gapoutput. Use “git checkout python_gap” or “git checkout gawk_gap” to switch to a different branch, recompile the script and try again. If none of these work I would have to close this issue because again, your errors cannot be replicated.

Frybank commented 1 year ago
  1. Did you recompile the script from the repository not the “release” compressed file.
  2. Did you remove previous temporary files
  3. Were the temp_ file and GAPOutput files generated and not empty
  4. Were the GAPoutput count file generated and not empty
  5. Run it with verbose 2

At this point unfortunately I’m running out of options due to me being unable to replicate the errors you’re encountering. You can try out other branches that use different methods of generating gapoutput. Use “git checkout python_gap” or “git checkout gawk_gap” to switch to a different branch, recompile the script and try again. If none of these work I would have to close this issue because again, your errors cannot be replicated. Dear author, thank you very much for your careful guidance and painstaking correction program. I have obtained the final result by downloading and using python_GAP. Here I would like to express my deep apologies and gratitude

bxuecarnegie commented 1 year ago

That's awesome! Glad it finally worked out. I'll be closing the issue then.