elzbth / jitterbug

Jitterbug is a bioinformatic software that predicts insertion sites of transposable elements in a sample sequenced by short paired-end reads with respect to an assembled reference.
17 stars 8 forks source link

Filtering? #6

Closed kbrevs closed 7 years ago

kbrevs commented 8 years ago

Hi,

I'm attempting the filtering script, and for output I am just getting an empty file, with no results or header - the first cluster gff file coming from the primary jitterbug script has about 10,000 insertions - am I doing something wrong, or is the filtering step filtering... everything?

K

mbosio85 commented 8 years ago

Hi,

would you mind sending here the command line you use, the output you get on the console and maybe the first lines of your gff file ?

thanks Mattia

kbrevs commented 8 years ago

Hi Mattia,

For sure:

[jitterbug-master] python tools/jitterbug_filter_results_func.py -g jitterbug.TE_insertions_paired_clusters.gff3 -c jitterbug.filter_config.txt -o jitterbug.TE_insertions_paired_clusters.filtered.gff3
[jitterbug-master]

(there is no output)

from jitterbug.TE_insertions_paired_clusters:

Scaffold1599 jitterbug TE_insertion 48811 49171 . . . supporting_fwd_reads=2; supporting_rev_reads=2; cluster_pair_ID=0; lib=None; Inserted_TE_tags_fwd=undefined; Inserted_TE_tags_rev=undefined; fwd_cluster_span=0; rev_cluster_span=0; softclipped_pos=(-1, -1); softclipped_support=0; het_core_reads=-1; zygosity=-1.000 Scaffold1599 jitterbug TE_insertion 76441 76483 . . . supporting_fwd_reads=4; supporting_rev_reads=12; cluster_pair_ID=1; lib=None; Inserted_TE_tags_fwd=undefined; Inserted_TE_tags_rev=undefined; fwd_cluster_span=175; rev_cluster_span=11; softclipped_pos=(-1, -1); softclipped_support=0; het_core_reads=-1; zygosity=-1.000 Scaffold317 jitterbug TE_insertion 25375 25453 . . . supporting_fwd_reads=2; supporting_rev_reads=2; cluster_pair_ID=2; lib=None; Inserted_TE_tags_fwd=undefined; Inserted_TE_tags_rev=undefined; fwd_cluster_span=0; rev_cluster_span=0; softclipped_pos=(-1, -1); softclipped_support=0; het_core_reads=-1; zygosity=-1.000 Scaffold317 jitterbug TE_insertion 43055 43179 . . . supporting_fwd_reads=2; supporting_rev_reads=2; cluster_pair_ID=3; lib=None; Inserted_TE_tags_fwd=undefined; Inserted_TE_tags_rev=undefined; fwd_cluster_span=0; rev_cluster_span=0; softclipped_pos=(-1, -1); softclipped_support=0; het_core_reads=-1; zygosity=-1.000

Thanks, Kristian

mbosio85 commented 8 years ago

HI Kristian,

it is possible to have an empty output in case none of the clusters fit the filtering parameters in the jitterbug.filter_config.txt file.

Can you check this, maybe editing it so that it should include basically everything, and see if you have an output?

Cheers, Mattia

kbrevs commented 8 years ago

Hi Mattia - I was starting to think along those lines - what are the most permissive parameters?

mbosio85 commented 8 years ago

Can you post here your file? I don't remember by hearth the parameter names. The format is Parameter min_value max_value

I would guess the most permissive are for read coverage 0 1000 for example, just to check if is the code that has a bug or is it due to some specific filtering setup

Mattia

kbrevs commented 8 years ago

Hi Mattia, here is what jitterbug produced: cluster_size 2 5 span 2 550 int_size 124 190 softclipped 2 14 pick_consistent 0 -1

I changed everything to 0-1000, and a few other adjustments, but nothing generated a filtered file.

cluster_size 0 1000 span 0 1000 int_size 0 1000 softclipped 0 1000 pick_consistent 0 1000

Do you think this is an issue with jitterbug or perhaps with the TE annotations generated by RepeatMasker/Modeler?

Thanks, Kristian

kbrevs commented 8 years ago

Okay, just to see if out TE database was part of the issue, I ran the jitterbug main script with just the defaults, and the paired clusters file contains 2010 entries - but even with this, the filtering stage simple generates an empty file.

mbosio85 commented 8 years ago

I am trying to think about issues but the filtering basically is a text editing script which should not bring that kind of problems. If you agree I can try the code here on my laptop on your file (or part of) and see if the result is consistent or not. In case it keeps being empty I can run some tests to see which filter exclude which lines etc.

I am sorry I cannot provide you an easier solution for this issue.

Mattia

kbrevs commented 8 years ago

No worries! I'd be happy to send the files over to you to try - where should I send them?

K

mbosio85 commented 8 years ago

send it to mattia.bosio@crg.eu I'll be happy to try it and get some feedback to you. would you mind sending also the original filter parameters file and the one you modified?

Thanks!!

Mattia

2016-09-06 15:57 GMT+02:00 kbrevs notifications@github.com:

No worries! I'd be happy to send the files over to you to try - where should I send them?

K

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/elzbth/jitterbug/issues/6#issuecomment-244958038, or mute the thread https://github.com/notifications/unsubscribe-auth/AJeQMFvzklr9xUeCdXzf_AeOAikV2B7zks5qnXExgaJpZM4Jyq8d .

Mattia Bosio Twitter:@MattiaBosio85 E-mail: mattia.bosio85@gmail.com LinkedIn :es.linkedin.com/pub/mattia-bosio/29/ba5/bb1/en

mbosio85 commented 7 years ago

Hi,

thanks to another user we found a way to fix the zygosity calculation. Now the code in the git repository is updated and should solve that item.

About the subfamily instead, it's more an annotation field that's missing from the input. Now the filtering script will not remove lines where both fwd and rev family types are not present, so there is no need to remove the "consistency" filter line

Let me know how it goes. Mattia