Norconex / collector-core

Collector-related code shared between different collector implementations
http://www.norconex.com/collectors/collector-core/
Apache License 2.0
7 stars 15 forks source link

For comma-separated ExtensionReferenceFilter extensions, ignore whitespace #3

Closed niels closed 8 years ago

niels commented 8 years ago

This brings parsing of the ExtensionReferenceFilter configuration in line with e.g. GenericMetadataChecksummer or MD5DocumentChecksummer, where whitespace is also ignored.

Turns a config like this:

  <referenceFilters>
    <filter
      class="${filterExtension}"
      onMatch="exclude"
      caseSensitive="false"
    >3dm,3g2,3gp,7z,8bi,ai,aif,app,asf,asx,avi,bak,bat,bin,bmp,c,cab,cfg,cgi,class,cpl,cpp,cs,cur,dbx,deb,dll,dmg,dmp,drv,drw,dtd,dwg,dxf,eps,exe,fla,flv,fnt,fon,gadget,gam,gho,gif,gpx,gz,hqx,iff,indd,ini,iso,jar,java,jpg,keychain,kml,lnk,m,m3u,m4a,max,mid,mim,mov,mp3,mp4,mpa,mpg,msi,nes,ori,otf,part,pct,pif,pkg,pl,plugin,png,prf,ps,psd,pspimage,py,qxd,qxp,ra,rar,rels,rm,rom,rpm,sav,sit,sitx,svg,swf,sys,thm,tif,tmp,toast,torrent,ttf,uue,vb,vcd,vob,wav,wma,wmv,wsf,xll,yuv,zip,zipx</filter>
  </referenceFilters>

Into this:

  <referenceFilters>
    <filter
      class="${filterExtension}"
      onMatch="exclude"
      caseSensitive="false"
    >
      3dm,
      3g2,
      3gp,
      7z,
      8bi,
      ai,
      aif,
      app,
      asf,
      asx,
      avi,
      […]
    </filter>
  </referenceFilters>