UBC-MDS / software-review-2022

0 stars 0 forks source link

Group 19 Submission: textfeatureinfor (R) #4

Open Jacq4nn opened 2 years ago

Jacq4nn commented 2 years ago

name: textfeatureinfor about: This R package sxtract information from text features which can be useful for feature engineering, or in other data science projects


Submitting Authors:

Repository: textfeatureinfor Version submitted: 0.0.0.9 Submission type: Standard Editor: RB Reviewers:

Archive: TBD Version accepted: TBD


Package: textfeatureinfor
Title: Text Features
Version: 0.0.0.9000
Authors@R: 
    c(person(given = "Lynn",
           family = "Wu",
           role = c("aut", "cre"),
           email = "lynnwbl@gmail.com"),
    person(given = "Kiran",
           family = "Phaterpekar",
           role = "aut",
           email = "kphaterp@student.ubc.ca"),
    person(given = "Jacqueline",
           family = "Chong",
           role = "aut",
           email = "jacqann@student.ubc.ca"),
    person(given = "Paniz",
           family = "Fazlali",
           role = "aut",
           email = "paniz.fazlali@gmail.com"))
Description: Package to extract interesting details about text.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
Imports: 
    rapportools,
    stopwords,
    stringr,
    stringi
Suggests: 
    testthat (>= 3.0.0)
Config/testthat/edition: 3

Scope

Our package aims to allow users to retrieve the number of punctuations, calculate the average word length, count the percentage of fully capitalised words, and to remove stopwords from a text.

Data scientist and casual programmers that would like to execute basic text feature engineering with fewer lines of code.

Yes. textfeatures, qdap and stopwords are some of the well-established packages. Our package aims to combine simplify common text featuring engineering steps into a function, to reduce the number of lines of code.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

Jacq4nn commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing:


Review Comments

joshsia commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 1.5 hours


Review Comments

  1. Running avg_word_len("it's me") results in an output of 3 which is unexpected. I am not sure what is causing this problem. I would have expected either 1.66 if the three "words" are "it", "s" and "me", or 2.5 if the words are "its" and "me".

  2. Running perc_cap_words("I") returns 0 which is unexpected behaviour. This could be because in the function, you have stringr::str_count(text, "\\b[A-Z]{2,}\\b") which looks for words that contain at least 2 characters. Thus, running perc_cap_words("I AM A BOY") returns 50 instead of 100.

  3. It would be great to add automated testing, which I'm sure you will include soon!

  4. It would be nice to add the Contributing and License sections to the README so that it is clear how you want other people to work on the package.

  5. It would be nice to have a vignette which demonstrates use of the functions in a single file and maybe even host it online.

  6. In the future, I think it would be nice to include functions that compute the percentage of words in all lower case, and maybe the median word length.

khalidcawl commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing:


Review Comments

Well done team! I am sure the project is in progress and you are improving it as I write this review. Here are my comments:

nicovandenhooff commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 30 minutes


Review Comments

Overall well done, this package definitely seems useful for NLP tasks and could assist with feature engineering. I understand that the package is in progress so you may already be working on some of the comments below.

nickmao1994 commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 1hr


Review Comments

Overall well done team! R does not seem to have library to extract punctuations so it might be a little bit harder for building the R package. Here are my comments: