Create R Script for Sorting Vignettes from Trello List

BerylKanali commented 1 year ago

@jwokaty Tried to put everything that is working currently here and I will update as we improve the script. It's more of hardcoded. If you have any reviews or suggestions let me know.

Milestones:

We can import data and retrieve package names and get their specific ranks then display the package name and rank.
Packages displayed are from version 3.17

Obstacles:

Getting individual maintainer for each package, the function is set to get all of the packages of a certain maintainer at once.(Still researching on it)
At some point the code displaying individual ranks breaks, trying to figure it out.

To do:

Create a function that sorts the packages according to rank and then assigns them a category(high or low depending on a certain threshold)
Display all these in a excel file which will be the final output file.

jwokaty commented 1 year ago

@BerylKanali, I wanted to give some general feedback before we pair program so that you have time to think about the process.

We don't need to find each package's maintainer; we just need to know if they're maintained by the Bioconductor core team to categorize them as To Do or Contact Maintainer (a new label) on the Github Project Board. Look back at #49 and make sure you understand all parts.

Consider what might make good variables given that we'd like to reuse the script and that it might take time (several months or year) before we use it again on a similar file.

Try to think in terms of functions to separate functionality. It's a choice when you have a small script, but then you can break the problem into smaller, testable pieces with clear functionality.

Try to choose variables whose names make it obvious what they contain and what their relation is to the surrounding code. You can imagine if you had a very large package with many variables or if you need to fix a package you wrote a year ago that you might forget the purpose of a variable. I would also avoid using keywords as variable or function names, such as data.

I would suggest using <- for assignment. = isn't wrong, but I think <- is preferred.

I can see how you experimented with the functions in BiocPkgTools, which is great. I also appreciate your documentation to help me understand where you're going. I didn't see any attempts to use one of the apply functions as we previously discussed. I would also think about how you're going to preserve the later put the results in a file.

jwokaty commented 1 year ago

I figured out what was wrong with my code. I think maybe I didn't initialize version after ran the loop.

The function I was looking for at the end of our session was attributes, which can tell you a little about an object's structure in the same way we were using class.

I haven't encountered the error you mentioned, so let's look at that tomorrow.

BerylKanali commented 1 year ago

I figured out what was wrong with my code. I think maybe I didn't initialize version after ran the loop.

The function I was looking for at the end of our session was attributes, which can tell you a little about an object's structure in the same way we were using class.

I haven't encountered the error you mentioned, so let's look at that tomorrow.

Great. We can try figuring it out tomorrow if I will not have solved it.

BerylKanali commented 1 year ago

@jwokaty I have added the latest changes to the script.

BerylKanali commented 1 year ago

@jwokaty I made a commit earlier today with the suggested changes.

BerylKanali commented 1 year ago

Hi @jwokaty I have remove implementation details, fixed spelling and wrapped long lines. I have also made some changes in the README details.

BerylKanali commented 1 year ago

@jwokaty I have gone through the script, made some changes then did a grammar and spell check on it please let me know if I have missed something.

Just realized in the code that this function priority = ifelse(rank[[1]] > threshold, "High","Low") should be priority = ifelse(rank[[1]] < threshold, "High","Low"). If I am not wrong.

Since the lower the rank number the higher priority the package. Eg S4Vectors

Also noticed a discrepancy maybe that is why we did not notice the error above: The rank we get in our final excel is different from what we would get if we looked for the rank of a specific package: Example:

rank1 <- pkgDownloadRank("S4Vectors","software" ,version) rank1 2/2083 0.1 But in our excel sheet, you can look at the one you share on slack which is

S4Vectors 45.05 High

This does not seem mathematically correct. I don't understand why we are getting 45.05.

We can discuss this tomorrow. If we figure this out we will know if this statement is correct or should remain as it was initially.

Just realized in the code that this function priority = ifelse(rank[[1]] > threshold, "High","Low") should be priority = ifelse(rank[[1]] < threshold, "High","Low"). If I am not wrong.

jwokaty commented 1 year ago

@BerylKanali Good work. I made a few changes to fix formatting, clarify the content of the input file and the implementation. You can see these changes at https://github.com/Bioconductor/sweave2rmd/pull/54/commits/5c29fcfeaac13c540357003418841356f60b83d2 for your reference. We'll be using this to find candidate vignettes for the next round of Outreachy.

Bioconductor / sweave2rmd

Create R Script for Sorting Vignettes from Trello List #54