code4craft / webmagic

A scalable web crawler framework for Java.
http://webmagic.io/
Apache License 2.0
11.45k stars 4.18k forks source link

请问谁有写好的将webmagic中的ResultItems 输出到excel中的文件?写好的pipeline文件,谢谢了 #683

Open 1BOB opened 7 years ago

zyfxgo commented 7 years ago

可以输出csv格式

1BOB commented 7 years ago

我是新手,能直接输出吗?之前在scrapy中能直接输出,webmagic要编码吗?,能给看看你写的代码吗?

1BOB commented 7 years ago

看了,但是还不知道怎么处理map->csv的转换,原谅我还是菜鸟,求大神贴下代码。求!

1BOB commented 7 years ago

在process函数里面有 page.putField("goods name", title); page.putField("star level", star); 也就是在类ResultItems里面的 private Map<String, Object> fields = new LinkedHashMap<String, Object>();里面有这样的参数。 现在要求将goods name,star level放到csv中,要求最上面一行有个goods name star level 这样的分类(一般都会这么干吧),你给的链接好像点不开,能麻烦大哥帮我写下吗?谢谢你了

1BOB commented 7 years ago

啊?没怎么听明白,[笑脸]

1BOB commented 7 years ago

还请大哥不吝赐教,友情赠送我段代码吧。贴上一段项目自带的pipeline代码: public class FilePipeline extends FilePersistentBase implements Pipeline {

private Logger logger = LoggerFactory.getLogger(getClass());

/**
 * create a FilePipeline with default path"/data/webmagic/"
 */
public FilePipeline() {
    setPath("/data/webmagic/");
}

public FilePipeline(String path) {
    setPath(path);
}

@Override
public void process(ResultItems resultItems, Task task) {
    String path = this.path + PATH_SEPERATOR + task.getUUID() + PATH_SEPERATOR;
    try {
        PrintWriter printWriter = new PrintWriter(new OutputStreamWriter(new FileOutputStream(getFile(path + DigestUtils.md5Hex(resultItems.getRequest().getUrl()) + ".html")),"UTF-8"));
        printWriter.println("url:\t" + resultItems.getRequest().getUrl());
        for (Map.Entry<String, Object> entry : resultItems.getAll().entrySet()) {
            if (entry.getValue() instanceof Iterable) {
                Iterable value = (Iterable) entry.getValue();
                printWriter.println(entry.getKey() + ":");
                for (Object o : value) {
                    printWriter.println(o);
                }
            } else {
                printWriter.println(entry.getKey() + ":\t" + entry.getValue());
            }
        }
        printWriter.close();
    } catch (IOException e) {
        logger.warn("write file error", e);
    }
}

}

1BOB commented 7 years ago

我点开你给我的连接了,正在理解中,谢谢你

1BOB commented 7 years ago

请问resultItems.keySet()这个函数是怎么写的,能贴一下吗?

1BOB commented 7 years ago

public class CSVFilePipeline extends FilePersistentBase implements Pipeline {

private Logger logger = LoggerFactory.getLogger(getClass());

/**
 * create a FilePipeline with default path"/data/csv/"
 */
public CSVFilePipeline() {
    setPath("/data/csv/");
}

public CSVFilePipeline(String path) {
    setPath(path);
}

private final static String[] UN_INIT = new String[0];

private String CSV_SEPERATOR = ",";

private String[] names = UN_INIT;

public synchronized void init(String[] objects) {
    System.out.println("objects是"+objects.toString()+"UN_INIT是"+UN_INIT.toString());

// if (objects == UN_INIT) { if(true){//上面的对象好像永远不能相等,后面再想办法,先让它进去一次 this.names = objects; } else { logger.warn("names is oready init", new UnsupportedOperationException()); } }

public String[] getCacheArray(int x) {
    String[] cacheArray = new String[x];
    for (int i = 0; i < x; i++) {
        cacheArray[i] = new String("");
    }
    return cacheArray;
}

@Override
public void process(ResultItems resultItems, Task task) {
    if (names == UN_INIT) {
        String[] strArray = new String[6];
        init(resultItems.getAll().keySet().toArray(strArray));
    }

// String path = this.path + PATH_SEPERATOR + task.getUUID() + PATH_SEPERATOR; String path="C:\Users\Administrator\Desktop"; System.out.println("000000000000000000000000000000000当前路径为"+path); try { System.out.println("接下来要创建文件了"); PrintWriter printWriter = new PrintWriter(new OutputStreamWriter(new FileOutputStream(getFile(path + DigestUtils.md5Hex(resultItems.getRequest().getUrl()) + ".csv")),"UTF-8")); String[] cache = getCacheArray(names.length); for (int i = 0; i < names.length; i++) { Object value = resultItems.get(names[i]); if (value != null) { cache[i] = value.toString(); } } for (int i = 0; i < names.length; i++) { if (i != 0) { printWriter.print(CSV_SEPERATOR); } printWriter.print(cache[i]); } printWriter.close(); } catch (IOException e) { logger.warn("write file error", e); } }

}

我稍微改了下,路径换成桌面了,但是桌面出不来.csv文件,这个单步执行也不知道哪错了,求指教

1BOB commented 7 years ago

谢谢,谢谢

wanfengsky commented 6 years ago

请问这个怎么解决的?