eggnogdb / eggnog-mapper

Fast genome-wide functional annotation through orthology assignment
http://eggnog-mapper.embl.de
GNU Affero General Public License v3.0
562 stars 105 forks source link

Go Evidence Annotation Option #119

Closed davidbio closed 5 years ago

davidbio commented 5 years ago

Hi,

I have questions regarding the --go_evidence option. The help states:

--go_evidence {experimental,non-electronic} Defines what type of GO terms should be used for annotation:experimental = Use only terms inferred from experimental evidencenon-electronic = Use only non- electronically curated terms

In the emapper.py script I can see that non-electronic is apparently the default value:

pg_annot.add_argument('--go_evidence', type=str, choices=('experimental', 'non-electronic'),
                          default='non-electronic',
                          help='Defines what type of GO terms should be used for annotation:'
                          'experimental = Use only terms inferred from experimental evidence'
                          'non-electronic = Use only non-electronically curated terms')

Further down, I spotted this:

if args.go_evidence == 'experimental':
        args.go_evidence = set(["EXP","IDA","IPI","IMP","IGI","IEP"])
        args.go_excluded = set(["ND", "IEA"])
elif args.go_evidence == 'non-electronic':
        args.go_evidence = None
        args.go_excluded = set(["ND", "IEA"])

Why does one have to choose between experimental and non-electronic? Technically, not filtering at all is also a valid approach, isn't it?

Why is there a whitelist (args.go_evidence) AND a blacklist (args.go_excluded)? Isn't either of them sufficient and more straight-forward for this task?

What about an additional option that allows user-defined white-listing as an idea? E.g. --go_evidence EXP,IDA,IPI,IEP This would only take into consideration annotations with the listed evidence codes and discard the rest.

Any ideas / explanations are appreciated!

Happy holidays! :christmas_tree: :santa: :gift:

Cheers, David

jhcepas commented 5 years ago

Hi David, using "go evidence = experimental only" the predictions are more reliable. Otherwise, you will infer GO terms based on orthologs that were annotated using homology-based methods, so you incur into some kind of circularity in the annotation process. Therefore, all GO electronic-only annotations are excluded in eggnog-mapper. You can only use: annotations that might be homoly-based but were at least manually curated (increase coverage of annotations) or being stricter and choose only experimental evidence as a functional source.

hope it helps! -jaime

On Fri, 21 Dec 2018 at 13:46, David Seide notifications@github.com wrote:

Hi,

I have questions regarding the --go_evidence option. The help states:

--go_evidence {experimental,non-electronic} Defines what type of GO terms should be used for annotation:experimental = Use only terms inferred from experimental evidencenon-electronic = Use only non- electronically curated terms

In the emapper.py script I can see that non-electronic is apparently the default value:

pg_annot.add_argument('--go_evidence', type=str, choices=('experimental', 'non-electronic'),

                      default='non-electronic',

                      help='Defines what type of GO terms should be used for annotation:'

                      'experimental = Use only terms inferred from experimental evidence'

                      'non-electronic = Use only non-electronically curated terms')

Further down, I spotted this:

if args.go_evidence == 'experimental':

    args.go_evidence = set(["EXP","IDA","IPI","IMP","IGI","IEP"])

    args.go_excluded = set(["ND", "IEA"])

elif args.go_evidence == 'non-electronic':

    args.go_evidence = None

    args.go_excluded = set(["ND", "IEA"])

Why does one have to choose between experimental and non-electronic? Technically, not filtering at all is also a valid approach, isn't it?

Why is there a whitelist (args.go_evidence) AND a blacklist (args.go_excluded)? Isn't either of them sufficient and more straight-forward for this task?

What about an additional option that allows user-defined white-listing as an idea? E.g. --go_evidence EXP,IDA,IPI,IEP This would only take into consideration annotations with the listed evidence codes and discard the rest.

Any ideas / explanations are appreciated!

Happy holidays! šŸŽ„ šŸŽ… šŸŽ

Cheers, David

ā€” You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jhcepas/eggnog-mapper/issues/119, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJ_SrLertpfijEHed-Jz1gf1uzdal8Kks5u7NgUgaJpZM4Zd7k5 .

davidbio commented 5 years ago

Thanks for the clarification!