docintelapp / DocIntel

Open Source Platform for storing, organizing, and searching documents related to cyber threats
https://docintel.org
Other
154 stars 24 forks source link

Rss importer fails when RSS entry is empty #87

Closed drew3381 closed 12 months ago

drew3381 commented 1 year ago

When the RSS importer runs on an empty RSS feed (which can happen), it fails and it cannot process the other RSS feeds.

Example of empty RSS feeds:

<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <id>https://www.example.com/rss</id>
  <title>Example</title>
  <updated>2023-07-27T15:25:42.947430+00:00</updated>
  <link href="https://www.example.com/rss"/>
  <generator uri="https://lkiesow.github.io/python-feedgen" version="0.9.0">python-feedgen</generator>
</feed>

Stack trace:

2023-07-27 15:14:09.5670 [INFO] [DocIntel.Core.Importers.RSSSourceImporter] Collecting RSS feed for 'Example' (870175bc-f9c7-402b-96cd-ceb8ac506ecf)
2023-07-27 15:14:09.5755 [ERROR] [DocIntel.Services.Importer.Runner] System.InvalidOperationException
2023-07-27 15:14:09.5755 [ERROR] [DocIntel.Services.Importer.Runner] Sequence contains no elements
   at System.Linq.ThrowHelper.ThrowNoElementsException()
   at System.Linq.Enumerable.Max[TSource,TResult](IEnumerable`1 source, Func`2 selector)
   at DocIntel.Core.Importers.RSSSourceImporter.PullAsync(Nullable`1 lastPull, Int32 limit)+MoveNext() in /src/DocIntel.Core/Importers/RSSSourceImporter.cs:line 154
   at DocIntel.Core.Importers.RSSSourceImporter.PullAsync(Nullable`1 lastPull, Int32 limit)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at DocIntel.Services.Importer.Runner.CollectFeed(Importer feed, AmbientContext feedContext) in /src/DocIntel.Services.Importer/Runner.cs:line 137
   at DocIntel.Services.Importer.Runner.CollectFeed(Importer feed, AmbientContext feedContext) in /src/DocIntel.Services.Importer/Runner.cs:line 137
   at DocIntel.Services.Importer.Runner.ImportFeeds() in /src/DocIntel.Services.Importer/Runner.cs:line 102
   at DocIntel.Services.Importer.Runner.RunAsync(CancellationToken cancellationToken) in /src/DocIntel.Services.Importer/Runner.cs:line 80

Analysis of the issue:

in DocIntel.Core\Importers\RSSSourceImporter.cs
line 139: rssMetadata.LastPull = feed.Items.Max(_ => _.PublishDate).DateTime;

=> there should be a check if feed.Items is empty or not before executing this line