dreikanter / wp2md

A script to convert Wordpress XML dump to markdown files
GNU General Public License v3.0
219 stars 32 forks source link

only the first sentence of content is exported #19

Open davidak opened 8 years ago

davidak commented 8 years ago

also only first sentence of comments.

title: PDF in Word konvertieren mit Tabellen
link: http://davidak.de/blog/1161-pdf-in-word-konvertieren/
author: davidak
description: 
post_id: 1161
created: 2010/06/11 09:18:51
created_gmt: 2010/06/11 08:18:51
comment_status: open
post_name: pdf-in-word-konvertieren
status: publish
post_type: post

# PDF in Word konvertieren mit Tabellen

Man kann ganz einfach PDF-Dokumente in Word-Dokumente konvertieren, indem man den kostenlose online Service von NitroPDF benutzt.

## Comments

**[FrankD](#2323 "2010-06-12 22:23:42"):** Hä?

**[hanna](#2443 "2010-07-06 17:48:06"):** wow, danke fürd en tipp. hab mich schon oft damit herumgeärgert.

**[davidak](#2326 "2010-06-13 19:38:52"):** @FrankD:
swantzter commented 8 years ago

i can validate this error and i'm trying hard to get through it as we need tis script...

automaciej commented 8 years ago

I've just run into this, and from poking around the code, it seems that the problem is somewhere between the CustomParser class and the XMLParser class. CustomParser expects that data() will be called only once per item, but in reality, XMLParser can call CustomParser.data() multiple times for one item; unfortunately CustomParser throws away all data except the one from the first call.

I'm running wp2md under Python 3.4.

diminuto commented 8 years ago

Hello! I am getting the same problem. For example, my post http://capitanmacuto.com/sardinia/ has been parsed only into the first paragraph:

`title: Sardinia link: http://capitanmacuto.com/sardinia/ author: macu description: post_id: 58 created: 2016/05/08 21:23:11 created_gmt: 2016/05/08 21:23:11 comment_status: closed post_name: sardinia status: publish post_type: post

Sardinia

Sardinia is an Italian island located in the western Mediterranean Sea. I only had two days so I could only see the marine city of Alghero and the Nuraghe Santu Antine situated in the so-called 'Nuraghe Valley'.`

I am running a new installation of wp2md and Python from today.

swantzter commented 8 years ago

Well I should've made a pr with my changes back to them, I'll dig up my local copy

swantzter commented 8 years ago

@diminuto but basically I think I got it working with the code on svbeon/wp2md (note that it also includes some freenode specific patches to give us just the right format) (and I'm unsure if it got the latest update pushed)

If I remember right I also did run s/\n\n/\n/g On the source XML, don't remember if I did that once or twice tough... Or if I didn't do it at all I tried a LOT of things you see...

I'll have to git Diff tomorrow and see... But generally: 61618989

diminuto commented 8 years ago

Thanks Svbeon! I'm not sure how to install your wpm2d version overwriting my local copy so I'll wait for your next reply :) Thanks very much!

swantzter commented 8 years ago

@diminuto turns out i gave up and switched to thomasf/exitwp my modifications can be found here: svben/exitwp