antigenomics / vdjdb-web

Interactive browser for the VDJdb database
https://vdjdb.cdr3.net
Apache License 2.0
14 stars 1 forks source link

Export generating unexpected data #87

Open arkhan19 opened 2 years ago

arkhan19 commented 2 years ago

Paired gene export is giving different results than expected. I have selected both TRA and TRB in filters. What exactly is the function of this option, I am getting a different number of samples in either case.

What I am doing:

  1. Selected TRA and TRB in Filters
  2. Export all the data either paired, TRA or TRB.

What I am getting: When Paired gene export is enabled:

TRB    42767
TRA    30002

When Paired gene export is disabled:

TRB    42658
TRA    29469

What is expected: I am under the impression that there might be some data whose pair isn't available, and their population must be more than the data whose pair is available. Why am I getting more samples when the option is enabled? This issue is just based on intuition, i haven't checked the code.

bvdmitri commented 2 years ago

Hey! This might be expected as if you tick both TRA and TRB there still might be some entries that do not satisfy other filters. In contrast, paired gene export is ignoring any other filters you specified. If you are sure this is not the case, you can try to diff your results and give a bit of more context here so we can figure it out together.

bvdmitri commented 2 years ago

@f3n1Xx For example, does the output match for you if you go to the Meta filters panel and tick both Include non-canonical and Include unmapped V/J filter options?

arkhan19 commented 2 years ago

I have filtered mouse and human species with both TRA and TRB selected. No other filter were changed.

arkhan19 commented 2 years ago

@f3n1Xx For example, does the output match for you if you go to the Meta filters panel and tick both Include non-canonical and Include unmapped V/J filter options?

my samples increased by 2 thousand

bvdmitri commented 2 years ago

I have filtered mouse and human species with both TRA and TRB selected. No other filter were changed.

Yes, but there are some other filters that are enabled by default (as I mentioned, Include non-canonical as an example).

my samples increased by 2 thousand

Can you check if you still have different results with ticked Include non-canonical and Include unmapped V/J filter filters? In my case if I select TRA and TRB for mouse and human species and tick both Include non-canonical and Include unmapped V/J filter - Include paired export option does not make any difference in number of samples.

bvdmitri commented 2 years ago

@f3n1Xx By default vdjdb.cdr3.net does not show and does not return you spurious CDR3 sequences (unless you explicitly enable this), but "export paired" option ignores this setting and returns you all available data no matter spurious it or not.

So I would say what you observe is an expected behaviour and your exported data should have some extra entries that are not included by default.