Open jefflee1103 opened 3 years ago
Hi Jeff,
My apologies for the late response! I was tied up with a grant submission deadline. As for my naive script on GitHub, I made the script for my buddy long time ago, ummm, I haven't updated recently! I was not aware of the problem, but let me look into it this weekend! (maybe!)
Cheers, Joseph
On Sat, May 22, 2021 at 10:42 AM Jeff YS Lee @.***> wrote:
Dear hangnoh,
Thank you for writing this awesome script! Not sure if you are making further updates to the 'flybaseR' package, but I noticed the form variable in the id.coverter() is not compatible with the current Flybase ID validator.
Following is how the session html_form looks like as of May 2021: [image: image] https://user-images.githubusercontent.com/41636396/119230316-909e5380-bb13-11eb-8b00-8535214aee09.png
The id.coverter() function seems to run fine (r5.44 convert to current) if I change form <- set_values(form.original, ids = paste(as.character(temp.x), collapse = "\n"), mode = "convert", convert = "fbgn") to form <- html_form_set(form.original, ids = paste(as.character(temp.x), collapse = "\n")) with following output:
[image: image] https://user-images.githubusercontent.com/41636396/119230395-e541ce80-bb13-11eb-9de5-1f67e962505c.png
Is this the expected output assuming the code is running okay?
Kind regards,
Jeff
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hangnoh/flybaseR/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEEAWR5IHH7KK7ZBBTNVRZ3TO67EXANCNFSM45KWTNHQ .
Yes, flybase has made some minor changes to a few HTML tags before a few years.
I will look into this next week. I wrote this script for my temporal use because I was expecting that FlyBase would adopt something like an API. Wait, I think there's a way now? https://flybase.github.io/api/swagger-ui/
On Mon, Feb 13, 2023 at 12:04 PM rmd13 @.***> wrote:
Yes, flybase has made some minor changes to a few HTML tags before a few years.
— Reply to this email directly, view it on GitHub https://github.com/hangnoh/flybaseR/issues/4#issuecomment-1428310002, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEEAWR4R5SY7FRCA37E7S2TWXJSRHANCNFSM45KWTNHQ . You are receiving this because you commented.Message ID: @.***>
I will look into this next week. I wrote this script for my temporal use because I was expecting that FlyBase would adopt something like an API. Wait, I think there's a way now? https://flybase.github.io/api/swagger-ui/ … On Mon, Feb 13, 2023 at 12:04 PM rmd13 @.> wrote: Yes, flybase has made some minor changes to a few HTML tags before a few years. — Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEEAWR4R5SY7FRCA37E7S2TWXJSRHANCNFSM45KWTNHQ . You are receiving this because you commented.Message ID: @.>
I have no idea how to use the Flybase API. But if API works, does that means the rvest no longer work?
This shows the failure:
session <- html_session("http://flybase.org/")
Warning message:
In request_GET(session, url) : Forbidden (HTTP 403).
The hack works not to the end because a similar error takes place in the downstream command:
library(httr)
url <- "http://flybase.org"
uastring <- "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36"
session <- html_session(url,user_agent(uastring))
the hack failed in one line in the middle:
aStock_Html <- read_html(aStock_Http)
Error in open.connection(x, "rb") : HTTP error 403.
re-hack still failed
> aStock_Html <- read_html(aStock_Http,user_agent(uastring))
Error in open.connection(x, "rb") : HTTP error 403.
I guess the rvest method should work as well, so I am not sure what is causing the problem. I will check this out!
On Mon, Feb 13, 2023 at 1:09 PM rmd13 @.***> wrote:
I will look into this next week. I wrote this script for my temporal use because I was expecting that FlyBase would adopt something like an API. Wait, I think there's a way now? https://flybase.github.io/api/swagger-ui/ … <#m406414401685911075> On Mon, Feb 13, 2023 at 12:04 PM rmd13 @.> wrote: Yes, flybase has made some minor changes to a few HTML tags before a few years. — Reply to this email directly, view it on GitHub <#4 (comment) https://github.com/hangnoh/flybaseR/issues/4#issuecomment-1428310002>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEEAWR4R5SY7FRCA37E7S2TWXJSRHANCNFSM45KWTNHQ https://github.com/notifications/unsubscribe-auth/AEEAWR4R5SY7FRCA37E7S2TWXJSRHANCNFSM45KWTNHQ . You are receiving this because you commented.Message ID: @.>
I have no idea how to use the Flybase API. But if API works, does that means the rvest no longer work?
This shows the failure:
session <- html_session("http://flybase.org/")
Warning message:
In request_GET(session, url) : Forbidden (HTTP 403).
The hack works not to the end because a similar error takes place in the downstream command:
library(httr)
url <- "http://flybase.org"
uastring <- "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36"
session <- html_session(url,user_agent(uastring))
the hack failed in one line in the middle:
aStock_Html <- read_html(aStock_Http) Error in open.connection(x, "rb") : HTTP error 403. re-hack still failed
aStock_Html <- read_html(aStock_Http,user_agent(uastring)) Error in open.connection(x, "rb") : HTTP error 403.
— Reply to this email directly, view it on GitHub https://github.com/hangnoh/flybaseR/issues/4#issuecomment-1428421984, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEEAWR4CZXCSG3ILQGLXYIDWXJ2EBANCNFSM45KWTNHQ . You are receiving this because you commented.Message ID: @.***>
I learned that REST API could use POST and GET method to visit website, and that the flybase API has many GET and POST command. But the command list is not big enough. I cannot find a way of the API to call a URL to search a fly stock name or gene name.
For example, to search a stock webpage to get the genotype text:
aStock_Http <- "http://flybase.org/reports/FBst0005047"
aStock_Html <- read_html(aStock_Http)
...then can analyze the genotype.
And how to do similar rvest command via REST API instead:
# get the entry to search input box: j2g_search_form(uniquery.pl) or (export2batch.pl)
form.original <- html_form(session)
# fill in a form with stock number to search
aStock <- 150337
form <- set_values(form.original, field = "SYM", data_class = "Stock", query = aStock)
# to search a fly stock number
result_raw <- submit_form(session, form);
Cannot find a portal for SEARCH from flybase API. All the commands need precise flybase ID. looks that the flexibility is far less than rvest.
I guess the rvest method should work as well, so I am not sure what is causing the problem. I will check this out! … On Mon, Feb 13, 2023 at 1:09 PM rmd13 @.> wrote: I will look into this next week. I wrote this script for my temporal use because I was expecting that FlyBase would adopt something like an API. Wait, I think there's a way now? https://flybase.github.io/api/swagger-ui/ … <#m406414401685911075> On Mon, Feb 13, 2023 at 12:04 PM rmd13 @.> wrote: Yes, flybase has made some minor changes to a few HTML tags before a few years. — Reply to this email directly, view it on GitHub <#4 (comment) <#4 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEEAWR4R5SY7FRCA37E7S2TWXJSRHANCNFSM45KWTNHQ https://github.com/notifications/unsubscribe-auth/AEEAWR4R5SY7FRCA37E7S2TWXJSRHANCNFSM45KWTNHQ . You are receiving this because you commented.Message ID: @.> I have no idea how to use the Flybase API. But if API works, does that means the rvest no longer work? This shows the failure: session <- html_session("http://flybase.org/") Warning message: In request_GET(session, url) : Forbidden (HTTP 403). The hack works not to the end because a similar error takes place in the downstream command: library(httr) url <- "http://flybase.org" uastring <- "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36" session <- html_session(url,user_agent(uastring)) the hack failed in one line in the middle: aStock_Html <- read_html(aStock_Http) Error in open.connection(x, "rb") : HTTP error 403. re-hack still failed aStock_Html <- read_html(aStock_Http,user_agent(uastring)) Error in open.connection(x, "rb") : HTTP error 403. — Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEEAWR4CZXCSG3ILQGLXYIDWXJ2EBANCNFSM45KWTNHQ . You are receiving this because you commented.Message ID: @.>
The problem may be the flybase action against Web Scraping?
Dear hangnoh,
Thank you for writing this awesome script! Not sure if you are making further updates to the 'flybaseR' package, but I noticed the
form
variable in theid.coverter()
is not compatible with the current Flybase ID validator.Following is how the session html_form looks like as of May 2021, which no longer contains several fields in your original code:
The
id.coverter()
function seems to run fine (r5.44 convert to current) if I changeform <- set_values(form.original, ids = paste(as.character(temp.x), collapse = "\n"), mode = "convert", convert = "fbgn")
toform <- html_form_set(form.original, ids = paste(as.character(temp.x), collapse = "\n"))
with following output:Is this the expected output assuming the code is running okay?
Kind regards,
Jeff