dimiboeckaerts / PhageHostLearn

Pairwise machine learning models for phage-host interaction prediction
MIT License
11 stars 2 forks source link

EmptyDataError: No columns to parse from file #1

Open J-JEMINA opened 1 year ago

J-JEMINA commented 1 year ago

Hey! I am trying to run this in google colab and and the 2.1 code block gave me this issue

ValueError Traceback (most recent call last) in <cell line: 3>() 1 phage_genomes_path = general_path+'/phages_genomes' 2 phanotate_path = '/usr/local/lib/python3.10/dist-packages/phanotate.py' ----> 3 phlp.phanotate_processing(general_path, phage_genomes_path, phanotate_path, data_suffix=data_suffix)

/content/drive/MyDrive/PhageHostLearn/code/phagehostlearn_processing.py in phanotate_processing(general_path, phage_genomes_path, phanotate_path, data_suffix, add, test) 229 """ 230 phage_files = listdir(phage_genomes_path) --> 231 phage_files.remove('.DS_Store') 232 if add == True: 233 RBPbase = pd.read_csv(general_path+'/RBPbase'+data_suffix+'.csv')

ValueError: list.remove(x): x not in list

So i modified the phagehostlearn_processing.py code a little bit to this

if '.DS_Store' in phage_files: phage_files.remove('.DS_Store')

But now I am receiving this error,

EmptyDataError Traceback (most recent call last) in <cell line: 3>() 1 phage_genomes_path = general_path+'/phages_genomes' 2 phanotate_path = '/usr/local/lib/python3.10/dist-packages/phanotate.py' ----> 3 phlp.phanotate_processing(general_path, phage_genomes_path, phanotate_path, data_suffix=data_suffix)

8 frames /content/drive/MyDrive/PhageHostLearn/code/phagehostlearn_processing.py in phanotate_processing(general_path, phage_genomes_path, phanotate_path, data_suffix, add, test) 253 temp_tab.write(split + b'\n') 254 temp_tab.close() --> 255 results_orfs = pd.read_csv(general_path+'/phage_results.tsv', sep='\t', lineterminator='\n', index_col=False) 256 257 # fill up lists accordingly

/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py in wrapper(*args, *kwargs) 209 else: 210 kwargs[new_arg_name] = new_arg_value --> 211 return func(args, **kwargs) 212 213 return cast(F, wrapper)

/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py in wrapper(*args, *kwargs) 329 stacklevel=find_stack_level(), 330 ) --> 331 return func(args, **kwargs) 332 333 # error: "Callable[[VarArg(Any), KwArg(Any)], Any]" has no

/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options) 948 kwds.update(kwds_defaults) 949 --> 950 return _read(filepath_or_buffer, kwds) 951 952

/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py in _read(filepath_or_buffer, kwds) 603 604 # Create the parser. --> 605 parser = TextFileReader(filepath_or_buffer, **kwds) 606 607 if chunksize or iterator:

/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py in init(self, f, engine, **kwds) 1440 1441 self.handles: IOHandles | None = None -> 1442 self._engine = self._make_engine(f, self.engine) 1443 1444 def close(self) -> None:

/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py in _make_engine(self, f, engine) 1751 1752 try: -> 1753 return mapping[engine](f, **self.options) 1754 except Exception: 1755 if self.handles is not None:

/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/c_parser_wrapper.py in init(self, src, kwds) 77 78 kwds["dtype"] = ensure_dtype_objs(kwds.get("dtype", None)) ---> 79 self._reader = parsers.TextReader(src, kwds) 80 81 self.unnamed_cols = self._reader.unnamed_cols

/usr/local/lib/python3.10/dist-packages/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.cinit()

EmptyDataError: No columns to parse from file

I don't think this is because of the change in code I made but I would like some help in resolving this.

Any help would be highly appreciated! @dimiboeckaerts

Thank you!

dimiboeckaerts commented 1 year ago

Hi again Jemina, thank you for pointing out the small bug in the code, I will adjust it accordingly! The remaining problem you still have here actually relates to the comment I made on the other GitHub repo: here you are in fact using PHANOTATE to process a phage genome into its CDSses. My best guess is that running PHANOTATE on Google Colab does not simply work like that. Although it probably should be able to run in a Colab theoretically, perhaps not the way I coded it (I haven't checked this out). You can run PHANOTATE easily on your local computer, it's quire fast and does not take a lot of memory. Same for Kaptive, you can actually just run it locally on your computer. As you can see, running the entire pipeline is not very straightforward at this point. I aim to further simplify it and make it more user-friendly once it is published!

J-JEMINA commented 1 year ago

After a short break, I returned to this project again and found that the phanotate had been updated to produce faa file. This made my work easy and now I have processed 248 genomes of mine and 248 faa files are in my hand, Can I please get some help in proceeding further with these to produce embedding files and detect RBPs? @dimiboeckaerts

Also, I was learning about the tools you have used and came across Kaptive, trained for Klebsiella genomes. So if I have to use this module for other bacterial species then Kaptive will not work, right? That's what my understanding is. So, is there a more generalized tool available so that it's easier to adapt it for other bacterial species or do I have to modify the bacterial genome processing for my bacteria separately? @dimiboeckaerts