galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.38k stars 992 forks source link

Still need to remove @ from paths in perl script? #12133

Closed jaredbernard closed 1 year ago

jaredbernard commented 3 years ago

I ended up with this error message after using Maker in the Main Galaxy instance:

Possible unintended interpolation of @2 in string at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.11/bin/…/lib/5.26.2/x86_64-linux-thread-multi/Config_heavy.pl line 260.
Possible precedence issue with control flow operator at /cvmfs/main.

I saw that others had this error with other tools (e.g. Trinity) within the last couple years. In one case, admin said the error meant insufficient memory was allotted to the job or that the inputs were wrong. Hoping this was the case, I checked the input and ran the job again, but got the same result.

In another case, @bernt-matthias said it was a miscommunication between Galaxy, which uses the “@” symbol, and the perl interpreter that doesn’t know what to do with it. He suggested adding a " \ " in front of the “@”, although I don’t know how to access the script to do this.

This person had a similar issue with Maker, and these people had this interpolation issue with other tools.

Any other ideas on how to handle this type of error? I still can’t tell whether it’s possible to salvage my results, which were apparently successfully run aside from the metadata problem that couldn’t be resolved. Or should I simply keep rerunning the job until it works?

I believe my inputs should be okay, as I'm primarily following the workflow of the Genome Annotation Tutorial.

Thanks for any ideas!

jaredbernard commented 3 years ago

I tried running this on another instance, Galaxy Europe, but got the same interpolation error.

hexylena commented 3 years ago

cc @abretaud (maker wrapper author) @bgruening for eu

abretaud commented 3 years ago

If i remember well, this error is a problem with the perl conda package, and is indeed a consequence of using the "@" symbol in the conda env path. However, I think the warning itself should be harmless = Maker should be able to run and produce its result. Maybe there's another error in the job you launched, do you see anything else in the logs?

jaredbernard commented 3 years ago

Thanks for getting back to me, @hexylena and @abretaud.

I also hoped it was just a harmless message about the ability to represent metadata, but that perhaps the results were still there. However, the results are "0 lines" long, and when I try to do anything downstream, such as train HMM, it fails because the results of Maker are empty.

Here is more detail on the error message from the log:

Possible unintended interpolation of @2 in string at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.11/bin/../lib/5.26.2/x86_64-linux-thread-multi/Config_heavy.pl line 260.
Possible precedence issue with control flow operator at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.11/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
Possible unintended interpolation of @2 in string at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.11/bin/../lib/5.26.2/x86_64-linux-thread-multi/Config_heavy.pl line 260.
Possible precedence issue with control flow operator at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.11/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
STATUS: Parsing control files...
WARNING: Temporary directory set to an NFS location.
TMP=/jetstream/scratch0/main/jobs/35785058/_job_tmp
The temporary directory in MAKER is specifically for
operations that are not NFS-safe, but you have chosen
to ignore this error. If you experience seemly random
freezing and failures, the TMP directory is the cause.

STATUS: Processing and indexing input FASTA files...
STATUS: Setting up database for any GFF3 input...
A data structure will be created for you at:
/jetstream/scratch0/main/jobs/35785058/working/dataset_56790601.maker.output/dataset_56790601_datastore

To access files for individual sequences use the datastore index:
/jetstream/scratch0/main/jobs/35785058/working/dataset_56790601.maker.output/dataset_56790601_master_datastore_index.log

STATUS: Now running MAKER...
WARNING: Cannot find >0, trying to re-index the fasta.
stop here: 0
ERROR: Fasta index error
 at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.11/bin/../lib/Process/MpiChunk.pm line 239.

Process::MpiChunk::_prepare(Process::MpiChunk=HASH(0x55fee8689518), HASH(0x55fee8689b60), 0) called at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.11/bin/../lib/Process/MpiTiers.pm line 73
    Process::MpiTiers::__ANON__() called at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.11/bin/../lib/Error.pm line 415
    eval {...} called at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.11/bin/../lib/Error.pm line 407
    Error::subs::try(CODE(0x55fee8683720), HASH(0x55fee8689968)) called at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.11/bin/../lib/Process/MpiTiers.pm line 79
    Process::MpiTiers::_prepare(Process::MpiTiers=HASH(0x55fee862eb80)) called at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.11/bin/../lib/Process/MpiTiers.pm line 56
    Process::MpiTiers::new("Process::MpiTiers", HASH(0x55fee862f240), 0, "Process::MpiChunk") called at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.11/bin/maker line 672
--> rank=NA, hostname=jetstream-iu-large15
ERROR: Failed in tier preparation
examining contents of the fasta file and run log

I also see something in the log about not recognizing the built-in Repeatmasker, so maybe that's a separate issue:

Species "drosophila" is not known to RepeatMasker.  There may
not be any TE families defined in the libraries for this
species/clade or there may be an error in the spelling.
Please check your entry against the NCBI Taxonomy database
and/or try using a broader clade or related species instead.
The full list of species/clades defined in the library may be
obtained using the famdb.py script.

ERROR: RepeatMasker failed
--> rank=NA, hostname=jetstream-iu-large15
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:1

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:1

I'm new to Galaxy, so I'm trying to figure out how to do things. I saw a post where @bernt-matthias had the same idea as you about the perl conda issue, so I'm currently trying the run on an older version of Maker (2.31.10) to see if it doesn't have the same interpolation problem with the "@" symbol.

But the fasta comment in the error log makes me wonder if the decompression of the fasta.gz also didn't work right.

I appreciate your feedback!

bgruening commented 3 years ago

@jaredbernard you are not running on usegalaxy.eu but on usegalaxy.org, isn't it?

ERROR: Fasta index error

This looks more like the error you should look for. Can you try to run the example at https://training.galaxyproject.org/training-material/topics/genome-annotation/ and see if maker for you on our training data?

jaredbernard commented 3 years ago

Thanks for the reply, @bgruening.

I have used both usegalaxy.org and usegalaxy.eu, because someone suggested trying another instance.

Before trying my own data, I was working through the Genome annotation with Maker tutorial, and I ran into the exact same error with Maker.

Here is the error message I got when doing the tutorial, using the training data, and the exact steps shown in the tutorial:

Possible unintended interpolation of @2 in string at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.11/bin/…/lib/5.26.2/x86_64-linux-thread-multi/Config_heavy.pl line 260.
Possible precedence issue with control flow operator at /cvmfs/main.
bgruening commented 3 years ago

If you run the tutorial please sent an error report and share please the history with us.

jaredbernard commented 3 years ago

Looks like you commented at the same moment as me. :^)

When I got the above error, either with the training data in the tutorial or with my own data, the result was an empty data set, so I could not proceed with subsequent steps.

bgruening commented 3 years ago

I see, can you share the history with us?

jaredbernard commented 3 years ago

I don't have my history of the tutorial anymore because I adapted it to my own data, and didn't want my storage used by training datasets. But as I mentioned, the steps were precisely as shown in the tutorial:

  1. Data upload (all S_pombe files from Zenodo)
  2. Fasta statistics on genome
  3. BUSCO
  4. 1st Maker round -- with genome selected, no re-annotation, eukaryotic organism, infer gene prediction from ESTs using Trinity assembly, infer gene prediction from protein alignment using Swissprot_no_pombe.fasta, no Augustus prediction, and disabled repeat masking.
  5. Could not continue to subsequent steps of workflow due to failure of Maker.

These steps resulted in the above @2 interpolation error when using the training data set.

I also got the same interpolation error when using my data.

By the way, I just tried the run again on a slightly older version of Maker (2.31.10 instead of 2.31.11), thinking that it could be a problem with an upgrade. But it ended with the same error:

Possible unintended interpolation of @2 in string at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.10/bin/../lib/5.26.2/x86_64-linux-thread-multi/Config_heavy.pl line 260.
Possible precedence issue with control flow operator at /cvmfs/main.galaxyproject.org/deps/_conda/envs/__maker@2.31.10/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
/jetstream/scratch0/main/jobs/35886380/command.sh: line 115: 18957 Aborted                 mpiexec -n ${GALAXY_SLOTS:-4} maker --ignore_nfs_tmp maker_opts.ctl maker_bopts.ctl maker_exe.ctl < /dev/null

Thank you so much for looking into this, @abretaud and @bgruening. Please let me know if there are any solutions, or something I can do.

jaredbernard commented 3 years ago

It appears to be working now, although I'm still checking the output.

I still get the @2 interpolation error, so I'm concerned that could cause problems downstream. But now the data sets are usable instead of empty. The difference seems to be that I didn't select a species for the Dfam database this time.

bernt-matthias commented 1 year ago

This is an issue of the software / conda. Paths need quoting. Closing this here.