arnaldorusso / blogger2md

Convert entire blogger [.atom files] to a MarkDown output of each post
2 stars 2 forks source link

AttributeError: 'NoneType' object has no attribute 'strip' #1

Open dwlf opened 9 years ago

dwlf commented 9 years ago
$ python blogger2md.py --output-dir ~/tmp/blogger --html2text ~/Downloads/blog-11-10-2014.xml
2014-11-10 14:36:46 [INFO] root: parsing feed
Traceback (most recent call last):
  File "blogger2md.py", line 169, in <module>
    main()
  File "blogger2md.py", line 161, in main
    data = process_entry(entry)
  File "blogger2md.py", line 74, in process_entry
    title = title.text.strip().replace('\n', ' ')

I don't know if this is maintained code, but I thought I would try giving it a run before just importing into WordPress and then exporting from there.

arnaldorusso commented 9 years ago

Hi @lloydde I have used this code while converting my blogger. What is your python version? Maybe it's an unicode problem (on python3).

dwlf commented 9 years ago
% python --version
Python 2.7.8
arnaldorusso commented 9 years ago

Hi @lloydde Could you try again?

dwlf commented 9 years ago

Thanks for the timely response @arnaldorusso, because of your encouragement I took another look. This is a really old blog, 2002, and I now remember that it was before posts on blogger had titles ;-)

http://f00lsw1sd0m.blogspot.com/ export: https://drive.google.com/open?id=0B9gbukx7r0LkbHdsZ0tFclRQSkE&authuser=0

The .xml files are correct, but all of the .md are of the same, most "recent" post.

python blogger2md.py --output-dir out  --html2text blog-11-12-2014.xml
$ head *84143700*
==> index#84143700.html <==
<div class="post-body entry-content" id="post-body-84164917" itemprop="description articleBody">
<p>I have been following <a href="http://subversion.tigris.org/">subversion's</a>&#13;<br/>progress for some time.  With greater interest for the last six months.  Although I appreciate chaos, the majority of me likes to know the elements and risks through structure.&#13;<br/></p><p>&#13;<br/><href>O'Reilly has an excellent article on subversion by one Rafael Garcia-Suarez.  It makes me consider revisiting Linux Journal's article on using cvs for my data, and to organize and set up version control for my data that is currently spread across five computers.&#13;<br/></href></p><p>&#13;<br/>My motivation and goals include learning subversion in detail, experiencing the usability of subversion, and qualifying the usefulness of revision control of data.  A significant aspect for me is that subversion stores information about not just the files, but the directories as well.&#13;<br/></p>
<div style="clear: both;"/>
</div>

==> index#84143700.md <==
Title: <Element {http://www.w3.org/2005/Atom}title at 0x105751290>
Date: 2002-11-6
Tags:

I have been following [subversion's](http://subversion.tigris.org/)
progress for some time. With greater interest for the last six months. Although I appreciate chaos, the majority of me likes to know the elements and risks through structure.

O'Reilly has an excellent article on subversion by one Rafael Garcia-Suarez. It makes me consider revisiting Linux Journal's article on using cvs for my data, and to organize and set up version control for my data that is currently spread across five computers.

==> index#84143700.xml <==
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0"><id>tag:blogger.com,1999:blog-3522281.post-84143700</id><published>2002-11-06T19:36:00.000-05:00</published><updated>2002-11-06T19:36:40.823-05:00</updated><category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/blogger/2008/kind#post"/><title type="text"/><content type="html">I was having a discussion with my friend Sarah, and it became a little political, and I suggested she listen to Billy Bragg's Put Away the Union Jack.  After that discussion, it made he want to check out &lt;href="www.billybragg.com"&gt;Billy's internet presense&lt;/a&gt;.  By pursuing the site, and reading some of the discussions, I found out about &lt;a href="http://www.furthurnet.org"&gt;FurthurNet&lt;/a&gt;.  There seems to be quite the culture of people trade legal, live recordings.</content><link rel="edit" type="application/atom+xml" href="https://www.blogger.com/feeds/3522281/posts/default/84143700"/><link rel="self" type="application/atom+xml" href="https://www.blogger.com/feeds/3522281/posts/default/84143700"/><link rel="alternate" type="text/html" href="http://f00lsw1sd0m.blogspot.com/index.html#84143700" title=""/><author><name>Lloyd Dewolf</name><uri>https://plus.google.com/113996098620973139531</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-aLQ7zHytzsQ/AAAAAAAAAAI/AAAAAAAAALA/qTzUFnuYTfk/s512-c/photo.jpg"/></author></entry>%