Azure / Microsoft365R

R SDK for interacting with Microsoft 365 APIs
Other
318 stars 46 forks source link

Scraping emails and attachments; cannot pivot_longer for listing attachments #193

Closed jic007 closed 11 months ago

jic007 commented 11 months ago

am trying to scrape my emails from outlook and list their attachments for some work. I have hundreds of emails and files and this can help out fasten my work flow

I succeeded accessing emails and listing attachments but found one issue, I cannot split the list object of email attachments into rows

for example

`# Create a tibble with a column containing a list of lists tbl <- tibble(x = list(list(1,2), list(3,4), list(5,6)))

Unnest the list of lists using unnest_longer()

unnest_list <- unnest_longer(tbl, x)

Print the unnested list

print(unnest_list)` i was expecting that if I apply something similar each attachment would come in a separate row, thus, duplicating basic info like email subject but it just won't come out of list format

` library(tidyverse) library(Microsoft365R)

outlook_app <- get_business_outlook()

Get the Inbox folder

inbox <- outlook_app$get_inbox()

Get the specified number of emails

emails <- inbox$list_emails(n = 2)

list of first email's attachments

emails[[1]]$list_attachments()

first attachment of first email

emails[[1]]$list_attachments()[1]

tried

tibble(files=emails[[1]]$list_attachments()) |> mutate(row_id = row_number(), files_extracted = map(row_id, (x){ emails[[1]]$list_attachments()[x] }))

tibble(files=emails[[1]]$list_attachments()) |> mutate(row_id = row_number(), files_extracted = map(row_id, (x){ pluck(files,x) }))

tibble(files=emails[[1]]$list_attachments()) |> mutate(row_id = row_number(), files_extracted = files |> unnest_longer())

hongooi73 commented 11 months ago

emails[[1]]$list_attachments()[1]

This should be emails[[1]]$list_attachments()[[1]]

jic007 commented 11 months ago

emails[[1]]$list_attachments()[1]

This should be emails[[1]]$list_attachments()[[1]]

thank you for this but what Iam trying to achieve is a tibble that looks somehting similar to the below. suppose you have one email and you have 4 attachments. I need to see the 4 attachments listed in a tibble with their names and other possible meta data

subject | attachment name email subject 01 | attachment 01 email subject 01 | attachment 02 email subject 01 | attachment 03 email subject 01 | attachment 04

it tried the below but didn't work tibble( subject = emails[[1]]$properties$subject, files=emails[[1]]$list_attachments() ) |> mutate(row_id = row_number(), attach_no = length(files)) |> rowwise() |>

mutate(

    files_extracted = map(row_id, \(x){
        emails[[1]]$list_attachments()[[x]]
    }))           

subject files row_id attach_no files_extracted

1 Subject xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1 8 2 Subject xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 2 8 3 Subject xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 3 8 4 Subject xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 4 8 5 Subject xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 5 8 6 Subject xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 6 8 7 Subject xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 7 8 8 Subject xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 8 8