galaxyproject / training-material

A collection of Galaxy-related training material
https://training.galaxyproject.org
MIT License
294 stars 846 forks source link

I want help with Making sense of a newly assembled genome tutorial #1983

Open Jalalalzanin opened 3 years ago

Jalalalzanin commented 3 years ago

Hello everyone I faced a problem with the output of "Search in textfiles (grep)" tool in this tutorial exactly with the “Regular Expression”: ^> the output is empty, i dont have experince with this Regular Expressions but when i deleted the Regular Expression the uotput of the tool showed three lines as follows:

CP020543.1 CP024090.1 LT906474.1 image

however in tutorial it should be come like this image

CP020543.1 CP024090.1 LT906474.1 i continued the tutorial but that made problems in next steps in particular with IGV browser

Thank you in advanced for your help

hexylena commented 3 years ago

Hi @Jalalalzanin, there are some fixes in #1547 but I'm not sure we caught this one yet. I'll have a look!

Jalalalzanin commented 3 years ago

thank you @hexylena
also under the subtitle of Aggregating data the parameters of "Datamash (operations on tabular data)" tool “Group by fields”: 1 “Operation to perform on each group”: “Type”: Count “On column”: Column: 1 the results are not identical as explained in the tutorial, I changed the parameters as following “Group by fields”: 2 and “Type”: Count “On column”: Column: 2 the results then came correct

hexylena commented 3 years ago

@shiltemann if you have time to test #1547, I think both of these issues should be fixed.

hexylena commented 3 years ago

Hi @Jalalalzanin. We have published a new version of the tutorial. Do you perhaps have some time to test this version out? https://training.galaxyproject.org/training-material/topics/assembly/tutorials/ecoli_comparison/tutorial.html

Jalalalzanin commented 3 years ago

Hi @hexylena, definitely I will do that thank you for your efforts

Jalalalzanin commented 3 years ago

Hi @hexylena, regarding to the new updated version of the tutorial, sorry I have sme notes that I found at the begining for uploding complete genomes from NCBI the option of Tab-delimited file is not available, instaed of that TSV is only found. I used the old file from previous version of the tutorial, however when Cut tool is used to prepaer the file the output file as shown below image and this the input parameters of the tool image

when moved to the next step using Rule-based to upload the sequences to Galaxy there was an error with Regular Expression (this step was repeated many times and the same error was noticed) image

BTW the cut tool are available in two options image the last one highlighted with red line was used

thank you

hexylena commented 3 years ago

Hi @Jalalalzanin thanks for testing this out! NCBI seems to have changed the format of their table, so I'm rewriting that bit. Thanks for the detailed report!

For the multiple cut tools, the new galaxyproject/galaxy#10024 feature will hopefully fix that. I will annotate the tools appropriately.

Jalalalzanin commented 3 years ago

my pleasure @hexylena Yes, NCBI was noticed to be updated since the begning of 2020. I will skip the uploding sequences into Galaxy to the step of Comparing genome architectures as provided in the tutorial, if I found any problems i will mention that here (sorry to bother you) . thank you

hexylena commented 3 years ago

If I found any problems i will mention that here (sorry to bother you)

Please do! We really appreciate this reviewing help :) Thanks for doing this.

Jalalalzanin commented 3 years ago

it's my pleasure @hexylena many thanks for you and all Galaxy team for your efforts

Jalalalzanin commented 3 years ago

In subtitle “Convert LASTZ output to BED” Explanation of “Converting to BEDimage also, in step 7 image But to get the results as provided in the tutorial script the parameters changed as following image

thank you

Jalalalzanin commented 3 years ago

Hi@hexylena in subtitle "Extract CDSs from annotation datasets" step 4 the parameters of Collapse Collection tool required for this step are not available image

image

thanks for help

hexylena commented 3 years ago

step 4 the parameters of Collapse Collection tool required for this step are not available

Ok this one is funny :) It was a bug in that version of the tool which I fixed. So it's prepend instead of append and I failed to update the training accordingly.

https://github.com/phac-nml/galaxy_tools/commit/c15f325148583721999c61d69cc9a3357f2b2c99#diff-3f361fd38d5a078ef688b4386331cc39

I've pushed fixes for all of these on my branch, thanks again for reviewing!

Jalalalzanin commented 3 years ago

So, I will continue the tutorial after the updating thank you @hexylena

hexylena commented 3 years ago

The tutorial has been updated in https://github.com/galaxyproject/training-material/pull/2016 if you @Jalalalzanin , or anyone else wants to check further.

Jalalalzanin commented 3 years ago

I will check the tutorial within the coming days, if there is any problem, I will send you the feedback . thank you @hexylena for your efforts

hexylena commented 3 years ago

Awesome, thanks so much @Jalalalzanin!!

Jalalalzanin commented 3 years ago

My pleasure @hexylena

Jalalalzanin commented 3 years ago

@hexylena @hexylena started the tutorial from the step Comparing genome architectures after downloading the direct link provided at the beginning of the tutorial (later I will check the steps of downloading step by step)

under subtitle "Getting sequences and annotations" when cut tools used with the provided parameters the results came different bc c10,c15 columns were selected, but the correct columns to be selected should be are C11, C19 to get the correct results as explained in the manuscript image this the results when the c10,c15 parameters are used image and this is when I changed the parameters to C11, C19 image it came correct as you explained in the tutorial

Jalalalzanin commented 3 years ago

in the next step problem with regular expression found as follows: image

image

hexylena commented 3 years ago

Ok, the c10/15 one looks like it is used twice, and I forgot to update one of them. And I used a different value in my galaxy history than works with the zenodo dataset, I guess, because the format of the TSV file changed.

I've rewritten the intro to say "please use this dataset from zenodo" to avoid this issue in the future.

For the regex, I updated it for using column 8, rather than column 11. Can you retry with c8,c20 instead of c19? That's the refseq url rather than genbank, so that will break in other ways.

hexylena commented 3 years ago

https://github.com/galaxyproject/training-material/pull/2072 this should ensure we're all using the zenodo dataset, that the correct columns (c8, c20) are used which produce the following results:

GCA_002079225.1 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/079/225/GCA_002079225.1_ASM207922v1
GCA_002761835.1 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/761/835/GCA_002761835.1_ASM276183v1
GCA_900186905.1 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/186/905/GCA_900186905.1_49923_G01
Jalalalzanin commented 3 years ago

Yes, I repeated that and it is working well now thank you @hexylena for quick response

Jalalalzanin commented 3 years ago

Hi @hexylena tool "Replace Text" not available in the list of tools image

hexylena commented 3 years ago

Hi @Jalalalzanin which server are you using? that tool is available on EU https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0

Jalalalzanin commented 3 years ago

@hexylena I am using galaxy Europe as well and searched for it but not available image

from the link you mentioned it is available image thank you

Jalalalzanin commented 3 years ago

Hi @hexylena I faced some problems so I repeated the tutorial form the beginning without using zenodo dataset instead NCBI data used, the output file of "Select lines that match an expression" contains only 16 columns and the last two seem similar image

when the cut tool with 8,20 used the result was not correct so I changed the parameters to 6,15 image

image

hexylena commented 3 years ago

Hi @Jalalalzanin I'll check this but could you please use only the zenodo dataset?

I do not want to use NCBI's because then this tutorial needs updates every time they change it :(

Jalalalzanin commented 3 years ago

@hexylena I used zenodo dataset before but some problems faced with Replace Text tool image the problem that "LASTZ Alignments" collection not appear on the collection data so I just dragged it from the history list

image

but the output file looks like image that's why I back to the beginning of the tutorial and used the NCBI data

thanks for help

hexylena commented 3 years ago

Would it be possible you share your history? So I can see what went wrong with the replace text step?

Jalalalzanin commented 3 years ago

https://usegalaxy.eu/u/jalal_alznin85/h/making-sense-of-a-newly-assembled-genome-102020 Edit Share Url

hexylena commented 3 years ago

Perfect! That's super helpful. I'll have a look now.

hexylena commented 3 years ago

Ahh ok,

  1. I think replace text failed because the lastz collection created use BAM format alignments rather than the blastn formatted alignments.
  2. Also it looks like history item 24 ("E coli c + relatives") is missing e coli C.
Jalalalzanin commented 3 years ago

that's ok @hexylena maybe the mistake is from my side for the "E coli c + relatives" I will check again the steps

hexylena commented 3 years ago

Sounds good, thanks for checking :)

Jalalalzanin commented 3 years ago

you welcome @hexylena I appreciate your help bc this will help me in my work