JuliaText / TextAnalysis.jl

Julia package for text analysis
Other
373 stars 95 forks source link

Major Documentation Revamp #134

Closed Ayushk4 closed 5 years ago

Ayushk4 commented 5 years ago

This PR is aimed at improving the docs in the following ways -

Also testing for the possible errors in the package.

Progress

Ayushk4 commented 5 years ago

I am also replacing the deprecated functions in the documentation. So, far I have finished documents.md.

Ayushk4 commented 5 years ago

@aviks I was going through the docs. I found that names (used for setting metadata - ~author entire corpus of docs~ name for the docs in the corpus ), conflicts with the base refer (v0.7) and (v1). What do you suggest about this? Should I open an issue?

aviks commented 5 years ago

author maybe?

On Sun, Mar 17, 2019 at 9:06 AM Ayush Kaushal notifications@github.com wrote:

@aviks https://github.com/aviks I was going through the docs. I found that names https://github.com/JuliaText/TextAnalysis.jl/blob/b3c467cebb776c6a7e686cfe2b6c9a20c2ca2205/src/metadata.jl#L32 (used for setting metadata - author name for entire corpus of docs), conflicts with the base refer (v0.7) https://docs.julialang.org/en/v0.7/base/base/#Base.names and (v1) https://docs.julialang.org/en/v1/base/base/#Base.names. What do you suggest about this? Should I open an issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaText/TextAnalysis.jl/pull/134#issuecomment-473643205, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXIJuB1qmtFTXVLGmXZV1DwMdYi5POJks5vXgWRgaJpZM4b3U1e .

Ayushk4 commented 5 years ago

A correction - Author/authors is already used for document's author, should I change the metadata containing the document name to documentName. Also, I am sending a separate PR for this.

Ayushk4 commented 5 years ago

The following are the metadata - names -> setting the document's name and authors for setting the author's name.

This is also missing from the documentation for accessing the corpus metadata, I am adding it.

zgornel commented 5 years ago
Ayushk4 commented 5 years ago
Ayushk4 commented 5 years ago

I think that maybe we can start by opening an issue of what all major changes are to be done for TextAnalysis.jl, taking into consideration the things mentioned in the conversation above. Thank you, for your review on the situation.

Ayushk4 commented 5 years ago

@aviks Can you please suggest your views on this, so that I may proceed accordingly.

aviks commented 5 years ago

Thanks Ayush, this looks good. Let me know once this is ready to merge. I'm happy to merge things over from StringAnalysis if Corneliu is ok with that.

zgornel commented 5 years ago

@aviks I'm perfectly fine with this; feel free to port to TextAnalysis whatever you see fit

aviks commented 5 years ago

Separate one please

On Mon, 18 Mar 2019, 18:48 Ayush Kaushal, notifications@github.com wrote:

@Ayushk4 commented on this pull request.

In docs/src/corpus.md https://github.com/JuliaText/TextAnalysis.jl/pull/134#discussion_r266590785 :

@@ -83,3 +139,56 @@ corpus. The easiest way to do this is to convert a Corpus object into a DataFrame:

 convert(DataFrame, crps)

+ +## Corpus Metadata + +You can also retrieve the metadata for every document in a Corpus at once: + + languages(): What language is the document in? Defaults to Languages.English(), a Language instance defined by the Languages package. + names(): What is the name of the document? Defaults to "Unnamed Document".

Do you prefer that I send another PR making changes, name -> title & name! -> title! & names -> titles & names! to titles! (across the entire codebase, including tests, docs) ? Or include in this one itself?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaText/TextAnalysis.jl/pull/134#discussion_r266590785, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXIJswvIK-wRXsrqwkjVruj7A05Stdaks5vX9-NgaJpZM4b3U1e .

Ayushk4 commented 5 years ago

@aviks I am almost done with the PR. All that remains to be done, is #135 being merged so that I can remove the merge conflicts that may arise.

Also, lsa() is not working. I will get more details about it and file a detailed issue.

Ayushk4 commented 5 years ago

@aviks Do you think, BM25 and Co-occurrence matrix will be a good addition to TextAnalysis.jl?

aviks commented 5 years ago

@aviks Do you think, BM25 and Co-occurrence matrix will be a good addition to TextAnalysis.jl?

Yes please!