crimBubble / ECCsplorer

The ECCsplorer is a bioinformatics pipeline for the automated detection of extrachromosomal circular DNA (eccDNA) from paired-end read data of amplified circular DNA.
GNU General Public License v3.0
18 stars 5 forks source link

some suggestions #1

Closed panxiaoguang closed 2 years ago

panxiaoguang commented 3 years ago

Hi,

Thanks for your useful tools, but I still have some questions when I use this tool.

1.it always copy raw reads and reference to the output dirs, That would cause a lot of space waste. If it is necessary to do so, can it be changed to a symbolic link.

  1. As with the previous problem, each execution process re-index reference sequence, which results in a lot of time waste. If I want to execute multiple samples, this process is repeated。
crimBubble commented 3 years ago

Hi,

Thanks for your useful tools, but I still have some questions when I use this tool.

1.it always copy raw reads and reference to the output dirs, That would cause a lot of space waste. If it is necessary to do so, can it be changed to a symbolic link.

  1. As with the previous problem, each execution process re-index reference sequence, which results in a lot of time waste. If I want to execute multiple samples, this process is repeated。

Thanks for your feedback. Yes the current approach is space intensiv but should prevent users from messing up their raw data. Nevertheless, I will consider using symbolic links for future updates. If you are working on multiple samples you can prepare the folder structure and use symbolic links for the reference data and index files. The pipeline will check for existing files and skips the indexing if they are available.

panxiaoguang commented 3 years ago

Hi, Thanks for your useful tools, but I still have some questions when I use this tool. 1.it always copy raw reads and reference to the output dirs, That would cause a lot of space waste. If it is necessary to do so, can it be changed to a symbolic link.

  1. As with the previous problem, each execution process re-index reference sequence, which results in a lot of time waste. If I want to execute multiple samples, this process is repeated。

Thanks for your feedback. Yes the current approach is space intensiv but should prevent users from messing up their raw data. Nevertheless, I will consider using symbolic links for future updates. If you are working on multiple samples you can prepare the folder structure and use symbolic links for the reference data and index files. The pipeline will check for existing files and skips the indexing if they are available.

Thanks for your reply, I have another questions. because In the log, I finally got "Sorry,something went wrong(exit_err)". I can't know where step that this error occured. Actually in my out dir, I can't find "TR_peak-region-all.bed", so maybe it happend at get_rough_coverage. So, I think the error log should be more accurate.

panxiaoguang commented 3 years ago

hello, I have another error:

2021-04-18 01:14:00,527 - [run_discordantread_detect] INFO: Merging and cleaning up regions.
Traceback (most recent call last):
  File "ECCsplorer.py", line 815, in <module>
    main()
  File "ECCsplorer.py", line 775, in main
    sum_mapper_win_coverage, sum_mapper_candidate_fas, analysis_errors = obj_mapper.mapper_coordinator()
  File "/dellfsqd2/ST_LBI/USER/panxiaoguang/app/ECCsplorer/lib/eccMapper.py", line 647, in mapper_coordinator
    self.run_discordantread_detect()
  File "/dellfsqd2/ST_LBI/USER/panxiaoguang/app/ECCsplorer/lib/eccMapper.py", line 437, in run_discordantread_detect
    min_coverage_allowed = int(int(max_coverage) * config.BACKGROUND_PERC)
ValueError: invalid literal for int() with base 10: b'1.62371e+06\n'
crimBubble commented 3 years ago

Hi, Thanks for your useful tools, but I still have some questions when I use this tool. 1.it always copy raw reads and reference to the output dirs, That would cause a lot of space waste. If it is necessary to do so, can it be changed to a symbolic link.

  1. As with the previous problem, each execution process re-index reference sequence, which results in a lot of time waste. If I want to execute multiple samples, this process is repeated。

Thanks for your feedback. Yes the current approach is space intensiv but should prevent users from messing up their raw data. Nevertheless, I will consider using symbolic links for future updates. If you are working on multiple samples you can prepare the folder structure and use symbolic links for the reference data and index files. The pipeline will check for existing files and skips the indexing if they are available.

Thanks for your reply, I have another questions. because In the log, I finally got "Sorry,something went wrong(exit_err)". I can't know where step that this error occured. Actually in my out dir, I can't find "TR_peak-region-all.bed", so maybe it happend at get_rough_coverage. So, I think the error log should be more accurate.

Thanks again for your feedback. I am not a bioinformatician and this is my first project done in python. As I am still developing my python skills, feedback is really appreciated. I will get back to improve the logging function in future updates but it will probably not be highest priority. With your other error, try to edit the lib/eccMapper.py file and change line 437 to min_coverage_allowed = int(float(max_coverage) * config.BACKGROUND_PERC). I hope this will solve the error.

panxiaoguang commented 3 years ago

Sorry,after fix line 437, I have another error:

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 32093083 and the array at index 1 has size 9672810
crimBubble commented 3 years ago

This error does not seem to be related to the previous one. Can you post the whole error code.

panxiaoguang commented 3 years ago

This error does not seem to be related to the previous one. Can you post the whole error code.

sure, the eror message is

2021-04-20 19:25:13,688 - [mapper_coordinator] INFO: Calculating rough coverage and find peaks.
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21940; MultiID 03)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21940)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21932; MultiID 01)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21932)
No peaks in TR_map-all.COVERAGE (ProcessID 21932)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21937; MultiID 06)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21937)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21934; MultiID 08)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21934)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21941; MultiID 09)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21941)
No peaks in TR_map-all.COVERAGE (ProcessID 21941)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21939; MultiID 04)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21939)
No peaks in TR_map-all.COVERAGE (ProcessID 21939)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21936; MultiID 00)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21936)
No peaks in TR_map-all.COVERAGE (ProcessID 21936)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21933; MultiID 02)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21933)
No peaks in TR_map-all.COVERAGE (ProcessID 21933)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21935; MultiID 07)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21935)
No peaks in TR_map-all.COVERAGE (ProcessID 21935)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21938; MultiID 05)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21938)
No peaks in TR_map-all.COVERAGE (ProcessID 21938)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24184; MultiID 04)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24173; MultiID 01)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24176; MultiID 07)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24178; MultiID 06)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24185; MultiID 03)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24174; MultiID 02)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24186; MultiID 09)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24183; MultiID 05)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24177; MultiID 00)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24175; MultiID 08)
Traceback (most recent call last):
  File "ECCsplorer.py", line 815, in <module>
    main()
  File "ECCsplorer.py", line 775, in main
    sum_mapper_win_coverage, sum_mapper_candidate_fas, analysis_errors = obj_mapper.mapper_coordinator()
  File "/dellfsqd2/ST_LBI/USER/panxiaoguang/app/ECCsplorer/lib/eccMapper.py", line 748, in mapper_coordinator
    axis=1)
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 32093083 and the array at index 1 has size 9672810
2021-04-20 20:44:48,152 - [r_shutdown] INFO: Shutting down Rserve.
2021-04-20 20:44:48,156 - [exit_err] ERROR: Sorry, something went wrong.

one point is it always report there are no peaks!

crimBubble commented 3 years ago

This error does not seem to be related to the previous one. Can you post the whole error code.

sure, the eror message is

2021-04-20 19:25:13,688 - [mapper_coordinator] INFO: Calculating rough coverage and find peaks.
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21940; MultiID 03)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21940)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21932; MultiID 01)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21932)
No peaks in TR_map-all.COVERAGE (ProcessID 21932)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21937; MultiID 06)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21937)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21934; MultiID 08)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21934)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21941; MultiID 09)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21941)
No peaks in TR_map-all.COVERAGE (ProcessID 21941)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21939; MultiID 04)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21939)
No peaks in TR_map-all.COVERAGE (ProcessID 21939)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21936; MultiID 00)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21936)
No peaks in TR_map-all.COVERAGE (ProcessID 21936)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21933; MultiID 02)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21933)
No peaks in TR_map-all.COVERAGE (ProcessID 21933)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21935; MultiID 07)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21935)
No peaks in TR_map-all.COVERAGE (ProcessID 21935)
Calculating mean coverage of TR_map-all.MAPPING (ProcessID 21938; MultiID 05)
Calculating peaks in TR_map-all.COVERAGE (ProcessID 21938)
No peaks in TR_map-all.COVERAGE (ProcessID 21938)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24184; MultiID 04)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24173; MultiID 01)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24176; MultiID 07)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24178; MultiID 06)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24185; MultiID 03)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24174; MultiID 02)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24186; MultiID 09)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24183; MultiID 05)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24177; MultiID 00)
Calculating mean coverage of CO_map-all.MAPPING (ProcessID 24175; MultiID 08)
Traceback (most recent call last):
  File "ECCsplorer.py", line 815, in <module>
    main()
  File "ECCsplorer.py", line 775, in main
    sum_mapper_win_coverage, sum_mapper_candidate_fas, analysis_errors = obj_mapper.mapper_coordinator()
  File "/dellfsqd2/ST_LBI/USER/panxiaoguang/app/ECCsplorer/lib/eccMapper.py", line 748, in mapper_coordinator
    axis=1)
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 32093083 and the array at index 1 has size 9672810
2021-04-20 20:44:48,152 - [r_shutdown] INFO: Shutting down Rserve.
2021-04-20 20:44:48,156 - [exit_err] ERROR: Sorry, something went wrong.

one point is it always report there are no peaks!

It seems like an unresolved exception handeling. I will investigate in this and hopefully fix this in a future update.