TrevorAshby / CodeRLHF

0 stars 0 forks source link

Dataset download & extraction #1

Closed TrevorAshby closed 8 months ago

TrevorAshby commented 9 months ago

This script should also only download / include specified languages: [Python, C, C++].

If possible, make the languages kept a variable.

Willlmcc commented 9 months ago

`#!/bin/sh

curl 'https://dax-cdn.cdn.appdomain.cloud/dax-project-codenet/1.0.0/Project_CodeNet.tar.gz' > datasetIBM.tar.gz

tar xvfz datasetIBM.tar.gz Project_CodeNet/metadata

cat languages.txt | while read p ; do echo $p tar xvfz datasetIBM.tar.gz Project_CodeNet/data/*/$p/ done`

To use variable languages create a languages.txt with a list of wanted programs. Must be names languages.txt and be available to Shell Script. For example: languages.txt: C C++ Python