duckduckgo / zeroclickinfo-fathead

DuckDuckGo Instant Answers based on keyword data files
https://duckduckhack.com/
Other
318 stars 365 forks source link

PHP: Improve parser to cover more of the documentation and create redirect entries #616

Closed moollaza closed 7 years ago

moollaza commented 7 years ago

Description

We've added the names of all PHP Functions to the cover/ directory, and after running the test, it reported that over 18,000 functions from our coverage are missing in the output.txt.

We need to investigate why the output only contains approximately 4000 articles, and update the parser to grab the missing 14,000 articles.

We also need to create redirects appropriately so that our output.txt also satisfies our coverage data. It may be that our coverage data needs to be modified, but that should be a last resort.

People to notify

@VitorVRS @gautamkrishnar

Get Started

Resources


Instant Answer Page: https://duck.co/ia/view/php

VitorVRS commented 7 years ago

@moollaza The actual parser only generate articles to php functions, the other entries are, at the majority, class and methods.

Function example: https://secure.php.net/manual/en/function.fopen.php Class example: https://secure.php.net/manual/en/class.datetime.php

And there are a lot o more "types" of data.

tagawa commented 7 years ago

That's correct. The coverage data list contains all the keywords from this search archive on php.net: https://secure.php.net/js/search-index.php?lang=en

I took both the single item name for each entry, e.g. add, but because there are many duplicates I also included the full item path for each entry, e.g. datetime.add, event.add, etc.

Hence a lot of keywords but they should all match an article on php.net. There are also non-functions in there such as error names and reserved variable names.

VitorVRS commented 7 years ago

@tagawa should the cover file be splitted on other files according with its context?

Or we should make the parse read that information too?

tagawa commented 7 years ago

@VitorVRS Good question - I was wondering that too. I think having it as a single file for now. I've made a small change to the internal code so the articles will be triggered with ... function, ... class, ... keyword, etc. Then we can monitor the coverage and see if we should split the file later, although that might be a lot of manual work.

ghost commented 7 years ago

Is already someone working on this, and if not, could I work on this?

davidjosephhayes commented 7 years ago

I would be down to help

moollaza commented 7 years ago

@ioanmoldovan @davidjosephhayes no one is currently working on this -- it's all yours 👍

mayank commented 7 years ago

@tagawa can I pick this?

gautamkrishnar commented 7 years ago

Yes @mayank you can carry on... 👍

mayank commented 7 years ago

@moollaza @tagawa When I search php array_key_exists, IA is not triggered while if I search for php array-key-exists IA is triggered. In output.txt title is php array-key-exists. OR @VitorVRS the entries in functions.txt is wrong it should be array_key_exists.

what needs to be done? should I create redirects?

VitorVRS commented 7 years ago

@mayank actually you removed the code which replaces "-" to "_".

Look at this diff on you last PR: https://github.com/duckduckgo/zeroclickinfo-fathead/pull/678/files#diff-b1ea04c31c1b207fde401593e8a86892R151

This wrongly pass by the review :sob:

mayank commented 7 years ago

@VitorVRS it was not passing the tests then 😕 , there is zero coverage for these type of functions