EddyRivasLab / easel

Sequence analysis library used by Eddy/Rivas lab code
Other
46 stars 26 forks source link

HMMER esl-fetch corrupt coords #49

Closed 3719left closed 3 years ago

3719left commented 3 years ago

I am using esl-sfetch for retrieves of about 40K substrings and these pop out:

awk '{print $1"/"$2"-"$3, $2, $3, $1}' newdata.txt | esl-sfetch -Cf KSfullprotein.fa - > juKS.fa

Fatal exception (source file esl_sqio_ascii.c, line 1986):
Failed to fetch subsequence residues -- corrupt coords?
Aborted (core dumped)

FYI:
EGP_GG01D1_R2_100387_2/2-365 2 365 EGP_GG01D1_R2_100387_2
EGP_GG01D1_R2_102123_29/2-368 2 368 EGP_GG01D1_R2_102123_29
EGP_GG01D1_R2_105669_1/1-184 1 184 EGP_GG01D1_R2_105669_1
EGP_GG01D1_R2_11253_1/81-392 81 392 EGP_GG01D1_R2_11253_1
EGP_GG01D1_R2_127404_1/3-232 3 232 EGP_GG01D1_R2_127404_1

grep -A 3 "EGP_GG01D1_R2_11253_1" KSfullprotein.fa
>EGP_GG01D1_R2_11253_1
VDVSEIREEAGFFDLGMDSLMAIELRRRLEQSVGKELPATLAMDFPRLSDVADYLLGDVLGLTEKPGAATPVQPSATTASDEPIAIISVACRFPGSPDADAYWEVLSGGVDAIREIPEDRFDVDEFYDPDQQAPGKIYTRSGGYLDRVDEFDPEFFGISPREAVWMDPQQRLMLEIAWESLERAGYAPASLRGSRTGVFVGVGANEYAHLMSGNSVEHLEAYFITGNALNAVAGRVAFTLGLEGPAVAMDTACSSSLVAVHQATQALRSGDCDMALAGGVNILLSPASIVAASRARMLAPDGRCKTFDAAADGYVRGEGCGILVLKRLSDAQRDGDRICAVIRSTAVNQDGASSGLTVPNGGAQQRLIRAALARAGLRGGDVDYLEAHGTGT
>EGP_GG01D1_R2_127404_1
AATGSGGPRIGLVLGLGAEHLKRWEGDFLAGGTRVFEPRRERTIVHALARRLQIRGPAVTVAAACASSGYAMAMGRSWIHAGWVDACVVGGCDILSPTAIAAFYNLRALSRRSDEPAKASRPFDKARDGFVMGEGGAFFMLERQSAAVARGARRYGELAGVGMSSDGVHMVIPSSDPVQAAAAITAALVDADAAPADVDYVNAHAAGTPVGDVAEAGAIRLALGTAADGVPV

The forth one somehow failed.

Same situation as https://www.biostars.org/p/402423/

Is it a bug?