Closed ghost closed 6 years ago
@aazzaa123 I can not reproduce any case here. Could you provide more detail about the bug you are facing? Could you follow the guidelines please (I re-copied them below)?
Is it the same than https://stackoverflow.com/questions/51370372/delete-tables-and-figures-from-a-set-of-docx-files-using-r?
If yes, it seems the question is how to extract content from a file and not how to delete content? This subject is documented here: https://davidgohel.github.io/officer/articles/officer_reader.html#import-word-document. You would have to filter elements where content_type %in% "paragraph"
David
[ ] The code that is producing the error, it has to be a minimal reproducible example.
Stackoverflow is providing good explanations about it: https://stackoverflow.com/help/mcve. You can use package reprex
to help you: http://reprex.tidyverse.org/. The most popular R stackoverflow question is about the subject: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example.
[ ] the results of R command sessionInfo()
. It had to be executed after you loaded the packages used by your example. This will let me know what is your version of R and what are the versions of the packages you used in your example.
[ ] you did checked you had the latest version of the package on CRAN (and on github if issue exists with CRAN version).
[ ] you searched in the open and closed issues on the github repository.
Yes - I saw it but it is not reproducible :) So it's a question and not an issue.
As already written, the answer is documented here: https://davidgohel.github.io/officer/articles/officer_reader.html#import-word-document.
You don't have to delete anything, you have to use docx_summary
and filter with column content_type
. You will have to filter elements where content_type %in% "paragraph"
and maybe drop paragraphs where the stylename is 'captions' (or whatever stylename you used for caption).
KR
Can you show me theses in the results of docx_summary
???
Sorry, your code is not reproducible, PLEASE follow the guidelines explained in the issue template.
sessionInfo()
Can you add a docx file that has to be imported? (you should be able to drag and drop it in a new comment section in this thread), it will be uploaded).
Here is a code:
library(officer)
doc <- read_docx("~/Downloads/issue143/Mitochondrial.DNA.docx")
data <- docx_summary(doc)
data <- data[data$content_type %in% "paragraph", ]
# data is in data$text
A sample of Mitochondrial.DNA.docx can be seen below:
# sample(data$text, size = 20)
[1] "paired and control Thai individuals, Clin. Genet. 66 (2004)"
[2] "[4] X. Estivill, N. Govera, E. Barcelo, C. Badenas, E. Romeo, L. Moral,"
[3] "the restriction endonuclease HaeIII (Amersham Pharmacia Biotech). In"
[4] "associated with the mitochondrial tRNASer(UCN) gene, as"
[5] ""
[6] "T7511C mutation in the mitochondrial DNA tRNASer(UCN) gene,"
[7] "using standard procedures [23]."
[8] "Cx31-4F50 GCTCTGCTACCTCATCTGCC 3020224"
[9] "cycles: 40 s at 94 °C, 50 s at 67 °C–58 °C, and 1 min at 72 °C, and then 35"
[10] "[18] D.P. Kelsell, J. Dunlop, H.P. Stevens, N.J. Lench, J.N. Liang, G."
[11] "at mitochondrial nucleotides 750 and 1438 were observed in"
[12] "Cx30-3R50 AGCAGCAGGTAGCACAACTC 3020"
[13] "Ackah, J. Wu, D.I. Choo, M.X. Guan, Mutational analysis of the"
[14] "R. Scozzi, L. D’Urbano, M. Zeviani, A. Torroni, Familial progressive"
[15] "HaeIII digest; Un, undigested PCR product."
[16] "We thank the patients and their families for their coop-"
[17] "the most frequent mutation in the GJB2 gene accounting"
[18] "Direct sequencing of the GJB3 gene revealed a new poly-"
[19] "described by Wattanasirichaigoon et al. [33]."
[20] "maternal pattern of inheritance."
In your SO questions, you are asking to delete all the tables and captions. This is not possible in your document example as it does not contain any named style nor table. All the content is unformated and the tables are not real tables but indented text:
> table(data$content_type, data$style_name)
< table of extent 1 x 0 >
no, sorry.
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.
Anyone knows anthing about deleting all tables and figures from a set of docx file (about 400 file) I tried with package offier but it works with keywords and I haave no commun pattern for the files. Is there any parameter to reach directly the tables and the figures or are there some other solution?