Open SreeniEadara opened 2 years ago
On 16/08/2022 21:10, SreeniEadara wrote:
[...] I am able to run all steps of the pipeline, including preprocess, check, and detect without error. Upon trying to run hyb analyse, however, I am met with the following output and am not sure what is causing this problem:
|hyb: Tue Aug 16 15:47:38 EDT 2022 analyse in=testdata.txt id=testdata format=fastq code= miss=0 qc=flexbar qual=33 link=TGGAATTCTCGGGTGCCAAGGC min=4 len=17 trim=0 filt=0 pc=0 align=bowtie2 db=hOH7 word=11 eval=0.1 ref= anti=0 type=all fold=UNAfold pref=mim hval=0.1 hmax=10 gmax=4 /usr/local/Caskroom/miniconda/base/envs/hyb/bin/hyb2fasta_bits_allRNAs.awk /usr/local/Caskroom/miniconda/base/envs/hyb/data/db/hOH7.tab testdata_comp_hOH7_hybrids_ua.hyb /usr/local/Caskroom/miniconda/base/envs/hyb/bin/hybrid-min testdata_comp_hOH7_hybrids_ua.bit_1.fasta testdata_comp_hOH7_hybrids_ua.bit_2.fasta 2>&1 > /dev/null testdata_comp_hOH7_hybrids_ua.bit_1.fasta: No such file or directory make: *** [testdata_comp_hOH7_hybrids_ua.bit_1.fasta-comp_hOH7_hybrids_ua.bit_2.fasta.ct] Error 1 |
Could you please help me understand what is causing this problem?
Hi, Sreenivas.
Great job creating a "hyb" env in Bioconda/Anaconda! It would be nice to add that to our GitHub repo when you've got it tested and working.
The missing file requires "flexbar" to run, but there is a bug in "hyb" caused by a change in the "flexbar -f" parameter, which now means produce fasta output: It previously meant specify the quality format e.g. "-f sanger", but it now means output FASTA.
I'll fix this on GitHub along with your fixes for "python", which is also a problem on Ubuntu 20.04 LTS because "python" is deprecated.
I've attached a patch for "hyb" that I'm now testing...
HTH,
Tony.
-- Minke Informatics Limited, Registered in Scotland - Company No. SC419028 Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK) tel. +44(0)19755 63548 http://minke-informatics.co.uk mob. +44(0)7985 078324 @.***
Hi Tony,
Awesome! Happy to hear from you. I can definitely open a pull request containing the Conda env setup once it has been validated.
I think attachments from email replies may not make it onto GitHub Issues, would you be able to add it in a development branch on this repository?
Thanks for your help!
Sincerely, Sreenivas
edit: removed my email, don't want it to be found by bots :)
On 17/08/2022 17:45, SreeniEadara wrote:
Hi Tony,
Awesome! Happy to hear from you. I can definitely open a pull request containing the Conda env setup once it has been validated.
I think attachments from email replies may not make it onto GitHub Issues, would you be able to email the patch to me at @. @.> or add it in a development branch on this repository?
Hi, Sreenivas.
I'll commit my changes as soon as I've finished testing: I noticed a couple of dependency problems and I changed the way the INSTALL script runs. As you probably know, we developed "hyb" under Bio-Linux 8, but that distro is now obsolete. I'm testing it under Ubuntu 20.04 LTS.
Bye,
Tony.
-- Minke Informatics Limited, Registered in Scotland - Company No. SC419028 Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK) tel. +44(0)19755 63548 http://minke-informatics.co.uk mob. +44(0)7985 078324 @.***
Hi Tony,
I was able to run hyb analyse on testdata.txt and didn't encounter any errors! Would you be able to send me the expected output so I can compare it against what I have?
I ended up using WSL to install Ubuntu 20.04 LTS and followed all installation steps - further debugging didn't work on macOS when using the Conda environment. One thing to note is that I had to install manually. I used git to clone the repository, and upon running INSTALL, it found the existing files and cleared them, but subsequently failed to get the files for hyb.
Upon running the following on my data I encountered the following:
sreenieadara@DESKTOP:/mnt/d/hyb/SRR959751$ hyb preprocess qc=flexbar trim=30 len=17 min=4 check detect align=bowtie2 word=11 analyse fold=vienna in=SRR959751.fastq.gz db=hOH7
hyb: Fri Aug 19 18:31:41 PDT 2022
preprocess check detect analyse
in=SRR959751.fastq.gz id=SRR959751 format=fastq code= miss=0 qc=flexbar qual=33 link=TGGAATTCTCGGGTGCCAAGGC min=4 len=17 trim=30 filt=0 pc=0 align=bowtie2 db=hOH7 word=11 eval=0.1 ref= anti=0 type=all fold=vienna pref=mim hval=0.1 hmax=10 gmax=4
gunzip -c SRR959751.fastq.gz > SRR959751.fastq
/usr/bin/flexbar -t SRR959751_clipped_qf -r SRR959751.fastq -q 30 -as TGGAATTCTCGGGTGCCAAGGC -ao 4 -u 3 -m 17 -n 1
flexbar: the given value '30' is not in the list of allowed values [TAIL, WIN, BWA]
Available on github.com/seqan/flexbar
make: *** [/home/sreenieadara/hyb/bin/hyb:1029: SRR959751_clipped_qf.fastq] Error 1
It looks like the -q parameter may not be the correct one to use in this case. I've changed it to -qt within bin/hyb and it is currently running. I will see if this works!
Hi Tony,
Unfortunately, the analysis is frozen at one step (over 20 hours without a change). Could you please let me know if this is expected or unexpected behavior? I am running the following on a fastq.gz of SRR959751 received via fastq-dump.
This is running in Ubuntu 20.04 LTS.
sreenieadara@DESKTOP:/mnt/d/hyb/SRR959751$ hyb preprocess qc=flexbar trim=30 len=17 min=4 check detect align=bowtie2 word=11 analyse fold=vienna in=SRR959751.fastq.gz db=hOH7
hyb: Fri Aug 19 19:07:52 PDT 2022
preprocess check detect analyse
in=SRR959751.fastq.gz id=SRR959751 format=fastq code= miss=0 qc=flexbar qual=33 link=TGGAATTCTCGGGTGCCAAGGC min=4 len=17 trim=30 filt=0 pc=0 align=bowtie2 db=hOH7 word=11 eval=0.1 ref= anti=0 type=all fold=vienna pref=mim hval=0.1 hmax=10 gmax=4
/usr/bin/flexbar -t SRR959751_clipped_qf -r SRR959751.fastq -qt 30 -as TGGAATTCTCGGGTGCCAAGGC -ao 4 -u 3 -m 17 -n 1
/home/sreenieadara/hyb/bin/solexa2fasta.awk SRR959751_clipped_qf.fastq | /home/sreenieadara/hyb/bin/fasta2tab.awk > SRR959751_clipped_qf.tab
/home/sreenieadara/hyb/bin/make_comp_fasta.pl SRR959751_clipped_qf.tab > SRR959751_comp.fasta
/usr/bin/fastqc -q -k 8 --noextract --contaminants /home/sreenieadara/hyb/data/fastqc/Contaminants SRR959751_clipped_qf.fastq
awk '{if(NR%4==2) print length($1)}' SRR959751_clipped_qf.fastq | /home/sreenieadara/hyb/bin/histogram.pl -n > SRR959751_clipped_qf.hist
Thanks!
Sincerely, Sreenivas
On 20/08/2022 23:58, SreeniEadara wrote:
Hi Tony,
Unfortunately, the analysis is frozen at one step (over 20 hours without a change). Could you please let me know if this is expected or unexpected behavior? I am running the following on a fastq.gz of SRR959751 received via fastq-dump.
This is running in Ubuntu 20.04 LTS.
@.***:/mnt/d/hyb/SRR959751$ hyb preprocess qc=flexbar trim=30 len=17 min=4 check detect align=bowtie2 word=11 analyse fold=vienna in=SRR959751.fastq.gz db=hOH7 hyb: Fri Aug 19 19:07:52 PDT 2022 preprocess check detect analyse in=SRR959751.fastq.gz id=SRR959751 format=fastq code= miss=0 qc=flexbar qual=33 link=TGGAATTCTCGGGTGCCAAGGC min=4 len=17 trim=30 filt=0 pc=0 align=bowtie2 db=hOH7 word=11 eval=0.1 ref= anti=0 type=all fold=vienna pref=mim hval=0.1 hmax=10 gmax=4 /usr/bin/flexbar -t SRR959751_clipped_qf -r SRR959751.fastq -qt 30 -as TGGAATTCTCGGGTGCCAAGGC -ao 4 -u 3 -m 17 -n 1 /home/sreenieadara/hyb/bin/solexa2fasta.awk SRR959751_clipped_qf.fastq | /home/sreenieadara/hyb/bin/fasta2tab.awk > SRR959751_clipped_qf.tab /home/sreenieadara/hyb/bin/make_comp_fasta.pl SRR959751_clipped_qf.tab > SRR959751_comp.fasta /usr/bin/fastqc -q -k 8 --noextract --contaminants /home/sreenieadara/hyb/data/fastqc/Contaminants SRR959751_clipped_qf.fastq awk '{if(NR%4==2) print length($1)}' SRR959751_clipped_qf.fastq | /home/sreenieadara/hyb/bin/histogram.pl -n
SRR959751_clipped_qf.hist |
Hi, Sreenivas
I ran it in less than an hour on my laptop "beluga" (Intel core-i5 + 16 GiB RAM + 500GB SSD):
@.***:~/Desktop/hyb$ time hyb preprocess qc=flexbar trim=30 len=17 min=4 check detect align=bowtie2 word=11 analyse fold=vienna in=SRR959751.fastq.gz db=hOH7 |& tee hyb.log hyb: Mon 22 Aug 08:17:58 BST 2022 preprocess check detect analyse in=SRR959751.fastq.gz id=SRR959751 format=fastq code= miss=0 qc=flexbar qual=33 link=TGGAATTCTCGGGTGCCAAGGC min=4 len=17 trim=30 filt=0 pc=0 align=bowtie2 db=hOH7 word=11 eval=0.1 ref= anti=0 type=all fold=vienna pref=mim hval=0.1 hmax=10 gmax=4 gunzip -c SRR959751.fastq.gz > SRR959751.fastq /usr/bin/flexbar -t SRR959751_clipped_qf -r SRR959751.fastq -qt 30 -as TGGAATTCTCGGGTGCCAAGGC -ao 4 -u 3 -m 17 -n 1 /usr/local/hyb/bin/solexa2fasta.awk SRR959751_clipped_qf.fastq | /usr/local/hyb/bin/fasta2tab.awk > SRR959751_clipped_qf.tab /usr/local/hyb/bin/make_comp_fasta.pl SRR959751_clipped_qf.tab > SRR959751_comp.fasta /usr/bin/fastqc -q -k 8 --noextract --contaminants /usr/local/hyb/data/fastqc/Contaminants SRR959751_clipped_qf.fastq awk '{if(NR%4==2) print length($1)}' SRR959751_clipped_qf.fastq | /usr/local/hyb/bin/histogram.pl -n > SRR959751_clipped_qf.hist /usr/local/hyb/bin/fasta2tab.awk SRR959751_comp.fasta | awk '{print (length($2))}' | /usr/local/hyb/bin/histogram.pl -n > SRR959751_comp.hist /usr/bin/bowtie2 -D 20 -R 3 -N 0 -L 16 -k 20 --local -i S,1,0.50 --score-min L,18,0 --ma 1 --np 0 --mp 2,2 --rdg 5,1 --rfg 5,1 -p 1 -x /usr/local/hyb/data/db/hOH7 -f SRR959751_comp.fasta > ./$$.sam 2> SRR959751_comp_hOH7.blast.err; \ sam2blast ./$$.sam > SRR959751_comp_hOH7.blast; \ rm ./$$.sam rm SRR959751_comp_hOH7.blast.err /usr/local/hyb/bin/mtophits_blast SRR959751_comp_hOH7.blast > SRR959751_comp_hOH7_mtophits.blast /usr/local/hyb/bin/create_reference_file.pl SRR959751_comp_hOH7_mtophits.blast > SRR959751_comp_hOH7_mtophits.ref /usr/local/hyb/bin/remove_duplicate_hits_blast.pl SRR959751_comp_hOH7_mtophits.ref SRR959751_comp_hOH7_mtophits.blast > SRR959751_comp_hOH7_singleE.blast /usr/local/hyb/bin/blast_stats SRR959751_comp_hOH7_singleE.blast > SRR959751_comp_hOH7_singleE.blast_stats.txt /usr/local/hyb/bin/get_mtop_hybrids.pl BLAST_THRESHOLD=0.1 MODE=2 MAX_OVERLAP=4 MAX_HITS_PER_SEQUENCE=10 OUTPUT_FORMAT=HYB SRR959751_comp_hOH7.blast > SRR959751_TEMP_FILE1_TXT /usr/local/hyb/bin/getseqs SRR959751_TEMP_FILE1_TXT SRR959751_comp.fasta > SRR959751_comp_hOH7_hybrids.fasta /usr/local/hyb/bin/fasta2tab.awk SRR959751_comp_hOH7_hybrids.fasta > SRR959751_TEMP_FILE1_TAB /usr/local/hyb/bin/txt2hyb.awk SRR959751_TEMP_FILE1_TAB SRR959751_TEMP_FILE1_TXT > SRR959751_comp_hOH7_hybrids.hyb /usr/local/hyb/bin/remove_duplicate_hybrids_hOH5.pl PREFER_MIM=1 SRR959751_comp_hOH7_mtophits.ref SRR959751_comp_hOH7_hybrids.hyb > SRR959751_comp_hOH7_hybrids_ua.hyb /usr/local/hyb/bin/hyb2fasta_bits_allRNAs.awk /usr/local/hyb/data/db/hOH7.tab SRR959751_comp_hOH7_hybrids_ua.hyb paste SRR959751_comp_hOH7_hybrids_ua.bit_1.fasta SRR959751_comp_hOH7_hybrids_ua.bit_2.fasta | awk 'NR%2==1{print $1"-"$2}; NR%2==0{print $1"&"$2}'|sed 's/->/-/g' > SRR959751_comp_hOH7_hybrids_ua.merged /usr/bin/RNAup --interaction_pairwise -o -w 20 < SRR959751_comp_hOH7_hybrids_ua.merged > SRR959751_comp_hOH7_hybrids_ua.rnaup 2> /dev/null /usr/local/hyb/bin/make_vienna SRR959751_comp_hOH7_hybrids_ua.rnaup SRR959751_comp_hOH7_hybrids_ua.merged > SRR959751_comp_hOH7_hybrids_ua.vienna /usr/local/hyb/bin/add_dG_hyb.pl SRR959751_comp_hOH7_hybrids_ua.hyb SRR959751_comp_hOH7_hybrids_ua.vienna >SRR959751_comp_hOH7_hybrids_ua_dg.hyb /usr/local/hyb/bin/combine_hyb_merge TWO_WAY_MERGE=1 PRINT_SEQ_IDS=1 SRR959751_comp_hOH7_hybrids_ua_dg.hyb > SRR959751_comp_hOH7_hybrids_ua_merged.hyb /usr/local/hyb/bin/make_nicer_vienna_hOH5.awk SRR959751_comp_hOH7_hybrids_ua.vienna > SRR959751_comp_hOH7_hybrids_ua.viennad /usr/local/hyb/bin/hybrid_stats SRR959751_comp_hOH7_hybrids_ua_dg.hyb > SRR959751_comp_hOH7_hybrids.hyb_stats.txt rm SRR959751_comp_hOH7_hybrids_ua.rnaup SRR959751_comp_hOH7_hybrids_ua.merged SRR959751_TEMP_FILE1_TXT SRR959751_TEMP_FILE1_TAB
real 56m47.363s user 56m48.892s sys 2m3.663s
There is a bug in "make_vienna" when running it under Python3:
@.***:/home/ajt/src/hyb/bin# git diff make_vienna diff --git a/bin/make_vienna b/bin/make_vienna index 959de60..4710839 100755 --- a/bin/make_vienna +++ b/bin/make_vienna @@ -1,5 +1,5 @@
-#@(#)make_vienna 2022-08-17 last modified by A.J.Travis +#@(#)make_vienna 2022-08-22 last modified by A.J.Travis """ Take fasta file (with '&' separating the sequences) and output from RNAup of the vienna package, and poduce the vienna format expected @@ -53,13 +53,13 @@ def main(rnaup_file, fasta_file): if line.startswith(">"): name, count = line, 0 elif count == 1:
You also need to install the RNA 'Vienna' package:
wget https://www.tbi.univie.ac.at/RNA/download/ubuntu/ubuntu_20_04/viennarna_2.5.1-1_amd64.deb sudo gdebi viennarna_2.5.1-1_amd64.deb
Let me know how you get on?
Tony.
-- Minke Informatics Limited, Registered in Scotland - Company No. SC419028 Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK) tel. +44(0)19755 63548 http://minke-informatics.co.uk mob. +44(0)7985 078324 @.***
Hi Tony,
Looks like the bug fix for Vienna worked! I have the vienna package as well as the python3, python, and perl bindings installed. Not sure if those were necessary or not.
Here are the first 10 lines of the result file SRR959751_comp_hOH7_hybrids_ua_dg.hyb:
1215_2879 AAGAGGGACGGCCGGGGGCATTCGTATTGCTCCCTGGTGGTCTAGTGGTTAGGAT -16.60 ENSG000000XXXXX_NR003286-2_RN18S1_rRNA 1 33 919 951 3.4e-08 ENSG_ENST_chr1-trna116-GluCTC_tRNA 31 55 1 25 2e-05
1577_2209 AAGAGGGACGGCCGGGGGCTATTGCACTTGTCCCGGCCTGT -17.68 ENSG000000XXXXX_NR003286-2_RN18S1_rRNA 1 19 919 937 0.023 MIMAT0000092_MirBase_miR-92a_microRNA 20 41 1 22 0.0005
2046_1671 AGAGGGACAAGTGGCGTTCTATTGCACTTGTCCCGGCCTGT -18.99 ENSG000000XXXXX_NR003286-2_RN18S1_rRNA 1 19 1446 1464 0.023 MIMAT0000092_MirBase_miR-92a_microRNA 20 41 1 22 0.0005
3050_1082 ACTGCATTATGAGCACTTAAAGTTAAAGTGCTTATAGTGCAGGTAG -24.37 MIMAT0004493_MirBase_miR-20a*_microRNA 1 22 1 22 0.00066 MIMAT0000075_MirBase_miR-20a_microRNA 24 46 1 23 0.00018
3068_1076 GGAAGATAACTATACAACCTACTGCCTTCCTGAGGTAGTAGGTTGTGTGGTTTCA -30.53 MIMAT0004482_MirBase_let-7b*_microRNA 10 30 1 21 0.0034 MIMAT0000063_MirBase_let-7b_microRNA 31 52 1 22 0.00094
3532_922 AAGAGGGACGGCCGGGGGCATTCGTATTGCTCCCTGTGGTCTAGTGGTTAGGATT -9.76 ENSG000000XXXXX_NR003286-2_RN18S1_rRNA 1 33 919 951 3.4e-08 ENSG_ENST_chr1-trna64-GluTTC_tRNA 32 53 1 22 0.00094
3746_872 GCCCCTGGGCCTATCCTAGAACTTTGGGTTCCGGGGGGAGTATGGTTGC -17.15 MIMAT0000760_MirBase_miR-331-3p_microRNA 1 21 1 21 0.0027 ENSG000000XXXXX_NR003286-2_RN18S1_rRNA 22 49 1153 1180 3.5e-07
4016_814 AGAGGGACAAGTGGCGTTTATTGCACTTGTCCCGGCCTGT -18.99 ENSG000000XXXXX_NR003286-2_RN18S1_rRNA 1 18 1446 1463 0.079 MIMAT0000092_MirBase_miR-92a_microRNA 19 40 1 22 0.00047
4521_718 CGGAAGATAACTATACAACCTACTGCCTTCCTGAGGTAGTAGGTTGTGTGGTTTC -30.53 MIMAT0004482_MirBase_let-7b*_microRNA 11 31 1 21 0.0034 MIMAT0000063_MirBase_let-7b_microRNA 32 53 1 22 0.00094
4766_680 TCCCTGAGACCCTAACTTGTGAGTGATGGGGATCGGGGATTGC -19.82 MIMAT0000423_MirBase_miR-125b_microRNA 1 22 1 22 0.00056 ENSG000000XXXXX_NR003286-2_RN18S1_rRNA 23 43 1598 1618 0.002
How does this compare to the result you received?
Also, one additional question - say a miRNA is listed first, and an mRNA is listed second in a single row. Does that mean that the chimera was a miRNA-first chimera, or are they ordered differently (i.e. alphabetical order)?
Hi Sreenivas,
It seems that your result is very similar to ours (there are small differences which seem related to 3' adapter truncation settings).
If a miRNA is listed first, this indicates a miRNA-first chimera (the coordinates of 1st and 2nd arms in each read are in columns 5-6 and 11-12, respectively).
best wishes, Greg
On Fri, 26 Aug 2022 at 16:13, SreeniEadara @.***> wrote:
Hi Tony,
Looks like the bug fix for Vienna worked! I have the vienna package as well as the python3, python, and perl bindings installed. Not sure if those were necessary or not.
Here are the first 10 lines of the result file SRR959751_comp_hOH7_hybrids_ua_dg.hyb:
1215_2879 AAGAGGGACGGCCGGGGGCATTCGTATTGCTCCCTGGTGGTCTAGTGGTTAGGAT -16.60 ENSG000000XXXXX_NR003286-2_RN18S1_rRNA 1 33 919 951 3.4e-08 ENSG_ENST_chr1-trna116-GluCTC_tRNA 31 55 1 25 2e-05 1577_2209 AAGAGGGACGGCCGGGGGCTATTGCACTTGTCCCGGCCTGT -17.68 ENSG000000XXXXX_NR003286-2_RN18S1_rRNA 1 19 919 937 0.023 MIMAT0000092_MirBase_miR-92a_microRNA 20 41 1 22 0.0005 2046_1671 AGAGGGACAAGTGGCGTTCTATTGCACTTGTCCCGGCCTGT -18.99 ENSG000000XXXXX_NR003286-2_RN18S1_rRNA 1 19 1446 1464 0.023 MIMAT0000092_MirBase_miR-92a_microRNA 20 41 1 22 0.0005 3050_1082 ACTGCATTATGAGCACTTAAAGTTAAAGTGCTTATAGTGCAGGTAG -24.37 MIMAT0004493_MirBase_miR-20a_microRNA 1 22 1 22 0.00066 MIMAT0000075_MirBase_miR-20a_microRNA 24 46 1 23 0.00018 3068_1076 GGAAGATAACTATACAACCTACTGCCTTCCTGAGGTAGTAGGTTGTGTGGTTTCA -30.53 MIMAT0004482_MirBase_let-7b_microRNA 10 30 1 21 0.0034 MIMAT0000063_MirBase_let-7b_microRNA 31 52 1 22 0.00094 3532_922 AAGAGGGACGGCCGGGGGCATTCGTATTGCTCCCTGTGGTCTAGTGGTTAGGATT -9.76 ENSG000000XXXXX_NR003286-2_RN18S1_rRNA 1 33 919 951 3.4e-08 ENSG_ENST_chr1-trna64-GluTTC_tRNA 32 53 1 22 0.00094 3746_872 GCCCCTGGGCCTATCCTAGAACTTTGGGTTCCGGGGGGAGTATGGTTGC -17.15 MIMAT0000760_MirBase_miR-331-3p_microRNA 1 21 1 21 0.0027 ENSG000000XXXXX_NR003286-2_RN18S1_rRNA 22 49 1153 1180 3.5e-07 4016_814 AGAGGGACAAGTGGCGTTTATTGCACTTGTCCCGGCCTGT -18.99 ENSG000000XXXXX_NR003286-2_RN18S1_rRNA 1 18 1446 1463 0.079 MIMAT0000092_MirBase_miR-92a_microRNA 19 40 1 22 0.00047 4521_718 CGGAAGATAACTATACAACCTACTGCCTTCCTGAGGTAGTAGGTTGTGTGGTTTC -30.53 MIMAT0004482_MirBase_let-7b*_microRNA 11 31 1 21 0.0034 MIMAT0000063_MirBase_let-7b_microRNA 32 53 1 22 0.00094 4766_680 TCCCTGAGACCCTAACTTGTGAGTGATGGGGATCGGGGATTGC -19.82 MIMAT0000423_MirBase_miR-125b_microRNA 1 22 1 22 0.00056 ENSG000000XXXXX_NR003286-2_RN18S1_rRNA 23 43 1598 1618 0.002
How does this compare to the result you received?
Also, one additional question - say a miRNA is listed first, and an mRNA is listed second in a single row. Does that mean that the chimera was a miRNA-first chimera, or are they ordered differently (i.e. alphabetical order)?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Greg,
Awesome! Glad to hear that the results file is similar, and good to know that the list order indicates order in the chimera.
I'm a bit confused about how to make the required databases to analyze using a different reference genome. I am able to rename target filenames in the Makefile and use 'make all' to make hg38.fasta.gz
(human genome) as well as the provided hOH7-microRNA.fasta.gz
, but the result after running hyb produces a result containing only hits between genomic loci.
Renaming hOH7-microRNA.fasta.gz
to hg38-microRNA.fasta.gz
, modifying the Makefile accordingly, and remaking the database produced the same result.
How would you recommend I set up the files before building the database? I am also trying to rename both files to start with hOH7 and I will see how it goes. Is there something here that I might be missing?
Hi,
Assuming you have a fasta file "input.fasta" with the sequences you want to you use as your database, type this to produce the mapping database:
make_hyb_db input.fasta
You can then run hyb with the command:
HYB_DB=path/to/hyb/db hyb analyse in=data.fastq db=input
I recommend that the database contains transcripts with names formatted as in the hOH7 file distributed with hyb, but hyb should also work with a database composed of genomic or other sequences.
Greg
On Tue, 30 Aug 2022 at 14:57, SreeniEadara @.***> wrote:
Hi Tony,
Awesome! Glad to hear that the results file is similar, and good to know that the list order indicates order in the chimera.
I'm a bit confused about how to make the required databases to analyze using a different reference genome. I am able to rename target filenames in the Makefile and use 'make all' to make hg38.fasta.gz (human genome) as well as the provided hOH7-microRNA.fasta.gz, but the result after running hyb produces a result containing only hits between genomic loci.
Renaming hOH7-microRNA.fasta.gz to hg38-microRNA.fasta.gz, modifying the Makefile accordingly, and remaking the database produced the same result.
How would you recommend I set up the files before building the database? I am also trying to rename both files to start with hOH7 and I will see how it goes. Is there something here that I might be missing?
— Reply to this email directly, view it on GitHub https://github.com/gkudla/hyb/issues/8#issuecomment-1231707940, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABM3FBTLT6LYRORBL4ERPNLV3YHOLANCNFSM56XC7B6A . You are receiving this because you commented.Message ID: @.***>
Hi Greg,
Thanks! I think I understand the process for building the databases a bit better now.
Also wanted to add that in Ubuntu 20.04 LTS within Windows Subsystem for Linux, the following line worked a bit better for BLAT installation within the INSTALL script:
make MACHTYPE=$MACHTYPE
Hi Greg,
I'm running into issues trying to use INSTALL on new Ubuntu 20.04 installations. I am able to get Hyb to work, but this involved building BLAT from source and making the default databases using 'make all'. I believe this is because rsync isn't a default package on 20.04 LTS, so after the directory is cleared the latest source isn't received. A modified INSTALL script worked better:
#!/bin/bash
#@(#)INSTALL 2022-08-22 A.J.Travis
#
# Install "hyb" under Ubuntu 20.04 LTS
#
# GitHub repository
export GITHUB=https://github.com/gkudla/hyb
# installation directory
if [ $USER == root ]; then
export HYB_HOME=/usr/local/hyb
else
export HYB_HOME=${HOME}/hyb
fi
# set PATH for "hyb" test run
export PATH=${HYB_HOME}/bin:$PATH
echo "Please add ${HYB_HOME}/bin to your PATH after running the INSTALL script"
echo "(press any key to continue...)"
read -n 1 key; echo
# download directory must be writeable
dir=$(pwd)
if [ ! -w ${dir} ]; then
echo "$0: can't write to ${dir}"
exit 1
fi
# check if "hyb" is already installed
if [ -e ${HYB_HOME} ]; then
echo "$0: ${HYB_HOME} already exists - replace it?"
read -n 1 key; echo
if [ "$key" == "y" ]; then
echo "alright . . . "
else
echo "$0: installation cancelled"
exit 1
fi
fi
# libpng-dev (required to compile BLAT)
if [ ! -r "/usr/include/libpng16/png.h" ]; then
if [ $USER == root ]; then
apt install libpng-dev
else
echo "$0: install libpng-dev to test hyb"
exit 1
fi
fi
# download and compile BLAT
wget -nc http://users.soe.ucsc.edu/~kent/src/blatSrc35.zip
unzip blatSrc35.zip
export MACHTYPE=$(arch)
mkdir -p ${HOME}/bin/${MACHTYPE}
cd blatSrc
make MACHTYPE=$MACHTYPE
# move to BLAT installation directory
if [ $USER == root ]; then
mv -i ${HOME}/bin/${MACHTYPE}/* /usr/local/bin/
else
export PATH=${HOME}/bin/${MACHTYPE}:${PATH}
fi
# build databases
cd ${HYB_HOME}/data/db
make
# Flexbar
if [ ! -x "$(which flexbar)" ]; then
if [ $USER == root ]; then
apt install flexbar
else
echo "$0: install flexbar to test hyb"
exit 1
fi
fi
# bowtie2
if [ ! -x "$(which bowtie2)" ]; then
if [ $USER == root ]; then
apt install bowtie2
else
echo "$0: install bowtie2 to test hyb"
exit 1
fi
fi
# UNAfold
if [ ! -x "$(which hybrid-min)" ]; then
if [ $USER == root ]; then
wget http://www.unafold.org/download/oligoarrayaux-3.8.tar.bz2
tar xf oligoarrayaux-3.8.tar.bz2
cd oligoarrayaux-3.8
make install
else
echo "$0: install bio-linux-oligoarrayaux to test hyb"
exit 1
fi
fi
# Vienna RNA
if [ ! -x "$(which RNAfold)" ]; then
if [ $USER == root ]; then
wget https://www.tbi.univie.ac.at/RNA/download/ubuntu/ubuntu_20_04/viennarna_2.5.1-1_amd64.deb
gdebi viennarna_2.5.1-1_amd64.deb
else
echo "$0: install Vienna RNA to test hyb"
exit 1
fi
fi
# test
cd ${HYB_HOME}/data/fastq
hyb analyse in=testdata.txt db=hOH7
# finished
exit 0
It seems there may be a decent number of packages that have to be installed outside of the INSTALL script, including rsync, wget, make, and unzip. The steps I followed during installation are here:
Ubuntu 20.04 LTS can be installed on Windows with the following command in Powershell (while running Powershell as an administrator):
wsl --install -d Ubuntu-20.04
Upon restart, an empty Linux shell will appear. You may need to press Enter to continue the installation. Hyb was installed as follows on Ubuntu 20.04 LTS. First, hyb source is cloned from GitHub:
git clone https://github.com/gkudla/hyb.git
Dependencies available on apt are installed:
sudo apt update
sudo apt install wget libpng-dev flexbar bowtie2 make gcc unzip ncbi-blast+ fastqc gdebi-core rnahybrid rsync
Package oligoarrayaux version 3.8 is installed as follows:
wget http://www.unafold.org/download/oligoarrayaux-3.8.tar.gz
gunzip oligoarrayaux-3.8.tar.gz
tar -xvf oligoarrayaux.tar
cd oligoarrayaux-3.8
./configure
make
make check
sudo make install
make clean
The SRA (Sequence Read Archive) tools must be downloaded and unzipped:
wget --output-document sratoolkit.tar.gz https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-ubuntu64.tar.gz
tar -vxzf sratoolkit.tar.gz
In order for the SRA tools to work, they must be added to the PATH. The PATH may reset with every new session.
export PATH=$PATH:$PWD/sratoolkit.3.0.0-ubuntu64/bin
The SRA tools must then be configured. This only needs to be performed once. Running the following command will launch the interactive SRA tools configuration utility. Under the “Cache” tab, the directory for local file caching should be set to an empty directory.
vdb-config -i
The viennaRNA package should then be installed:
wget https://www.tbi.univie.ac.at/RNA/download/ubuntu/ubuntu_20_04/viennarna_2.5.1-1_amd64.deb
wget https://www.tbi.univie.ac.at/RNA/download/ubuntu/ubuntu_20_04/python3-rna_2.5.1-1_amd64.deb
wget https://www.tbi.univie.ac.at/RNA/download/ubuntu/ubuntu_20_04/perl-rna_2.5.1-1_amd64.deb
sudo gdebi viennarna_2.5.1-1_amd64.deb
sudo gdebi python3-rna_2.5.1-1_amd64.deb
sudo gdebi perl-rna_2.5.1-1_amd64.deb
BLAT is installed as follows:
wget -nc http://users.soe.ucsc.edu/~kent/src/blatSrc35.zip
unzip blatSrc35.zip
export MACHTYPE=$(arch)
mkdir -p ${HOME}/bin/${MACHTYPE}
cd blatSrc
make MACHTYPE=$MACHTYPE
sudo mv -i ${HOME}/bin/${MACHTYPE}/* /usr/local/bin
Hyb includes a human transcriptome and miRNA database (hOH7) by default. Databases can be built as follows:
cd data/db
make all
You can test that Hyb was installed correctly with the following:
cd ..
cd fastq
hyb analyse in=testdata.txt db=hOH7
You can check the resulting .hyb files to verify that Hyb was successfully installed (there should be four, ending as follows:
This procedure works well for me but may not be ideal for all users. Do you think you could post these instructions or modify the INSTALL script so that it works better on 20.04 LTS? Please let me know if there is something I am missing and INSTALL should be working normally. If you would like, I can also open a pull request to update the README with these instructions.
On 10/09/2022 19:01, SreeniEadara wrote:
Hi Greg,
I'm running into issues trying to use INSTALL on new Ubuntu 20.04 installations. I am able to get Hyb to work, but this involved building BLAT from source and making the default databases using 'make all'. I believe this is because rsync isn't a default package on 20.04 LTS, so after the directory is cleared the latest source isn't received. A modified INSTALL script worked better: [...]
Hi, SreeniEadara.
We developed "hyb" under "Bio-Linux", where most of the dependencies were already installed. I'll modify the INSTALL script to check that all your list of dependencies are installed before the script tries to install "hyb" and other dependencies not in the Ubuntu repositories.
Ideally, "hyb" should be a .deb package - That's a work in progress.
Thanks for your effort to get "hyb" o work under WSL,
Tony.
-- Minke Informatics Limited, Registered in Scotland - Company No. SC419028 Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK) tel. +44(0)19755 63548 http://minke-informatics.co.uk mob. +44(0)7985 078324 @.***
Hi Tony,
Sounds good! Let me know if I can help validate a new install script on WSL.
Sincerely, Sreenivas
Hi, SreeniEadara.
Sorry it's taken me so long to respond: I've just updated the INSTALL script, to include the missing dependencies that you suggested. Please let me know about any issues if you try it out.
Thanks for your interest in "hyb",
Tony.
Hi Tony,
Can you please let me know how to use your install script?
thanks Greg
On Tue, 13 Dec 2022 at 23:13, Tony Travis @.***> wrote:
Hi, SreeniEadara.
Sorry it's taken me so long to respond: I've just updated the INSTALL script, to include the missing dependencies that you suggested. Please let me know about any issues if you try it out.
Thanks for your interest in "hyb",
Tony.
— Reply to this email directly, view it on GitHub https://github.com/gkudla/hyb/issues/8#issuecomment-1350005804, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABM3FBW24V6L3KGLAMWUZ2DWND7JTANCNFSM56XC7B6A . You are receiving this because you commented.Message ID: @.***>
On 14/12/2022 09:52, gkudla wrote:
Hi Tony,
Can you please let me know how to use your install script?
Hi, Greg.
It's just a "bash" shell script:
bash INSTALL
or
chmod +x INSTALL ./INSTALL
I think "hyb" should be distributed as a deb package, and I've discussed packaging it for Debian/Ubuntu with the Debian-Med team.
HTH,
Tony.
-- Minke Informatics Limited, Registered in Scotland - Company No. SC419028 Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK) tel. +44(0)19755 63548 http://minke-informatics.co.uk mob. +44(0)7985 078324 @.***
Hi,
I'm trying to run hyb on the example data using Mac OSX on a 2018 MacBook Air.
I've installed all dependencies besides flexbar 2.5 using Conda (edit: flexbar 2.5 was installed manually). My list of installed packages is as follows:
I've also configured my Conda environment to set a few useful paths on activation as follows. The paths are all unset prior to deactivation:
I've also configured sra-tools, and changed the shebang line on the top of sam2blast to
#!/usr/bin/env python3
so it can work on MacOS. All of the contents of hyb's source, including the scripts in bin, the entry in man, data, and lib have been moved to the corresponding folders in the path of the Conda environment so that they can easily be accessed upon activation. I also usedmake all
to make the included hOH7 database.I am able to run all steps of the pipeline, including preprocess, check, and detect without error. Upon trying to run hyb analyse, however, I am met with the following output and am not sure what is causing this problem:
Could you please help me understand what is causing this problem?
Thanks!
Sincerely, Sreenivas